如何在Python 3.1的string中使用HTML实体？

我看了四周，只发现了python 2.6及更早版本的解决scheme，没有关于如何在python 3.X中做到这一点。（我只能访问Win7的盒子。）

我必须能够在3.1中做到这一点，最好没有外部库。目前，我已经安装了httplib2并访问命令提示curl（这就是我得到页面的源代码）。不幸的是，curl不能解码html实体，据我所知，我找不到在文档中解码它的命令。

是的，我试图让美丽的汤来工作，很多时候在3.X没有成功。如果你可以在MS Windows环境下提供有关如何使用python 3的EXPLICIT指令，我将不胜感激。

所以，要清楚，我需要把这样的string变成： Suzy & John Suzy & John成这样的string：“Suzy＆John”。

你可以使用函数html.unescape ：

在Python3.4 +中 （感谢JF Sebastian的更新）：

 import html html.unescape('Suzy &amp; John') # 'Suzy & John' html.unescape('&quot;') # '"'

在Python3.3或更旧版本中：

 import html.parser html.parser.HTMLParser().unescape('Suzy &amp; John')

在Python2中 ：

 import HTMLParser HTMLParser.HTMLParser().unescape('Suzy &amp; John')

您可以使用xml.sax.saxutils.unescape来达到此目的。该模块包含在Python标准库中，并且可以在Python 2.x和Python 3.x之间移植。

 >>> import xml.sax.saxutils as saxutils >>> saxutils.unescape("Suzy &amp; John") 'Suzy & John'

显然我没有足够的声誉去做任何事情，只是发布这个。 unutbu的答案并不能避免引用。我发现的唯一的事情就是这个function

导入重新
从htmlentitydefs导入name2codepoint作为n2cp

 def decodeHtmlentities（string）：
     def substitute_entity（match）：        
         ent = match.group（2）
        如果match.group（1）==“＃”：
            返回unichr（int（ent））
        其他：
             cp = n2cp.get（ent）
            如果cp：
                返回unichr（cp）
            其他：
                返回match.group（）
     entity_re = re.compile（“＆（＃？）（\ d {1,5} | \ w {1,8}）;”）
    返回entity_re.subn（substitute_entity，string）[0]

我从这个页面得到的。

Python 3.x也有html.entities

在我的情况下，我有一个htmlstring在as3转义函数中转义。经过一个小时的谷歌search没有发现任何有用的东西，所以我写了recusrive函数来满足我的需求。这里是，

 def unescape(string): index = string.find("%") if index == -1: return string else: #if it is escaped unicode character do different decoding if string[index+1:index+2] == 'u': replace_with = ("\\"+string[index+1:index+6]).decode('unicode_escape') string = string.replace(string[index:index+6],replace_with) else: replace_with = string[index+1:index+3].decode('hex') string = string.replace(string[index:index+3],replace_with) return unescape(string)

编辑1增加了处理unicode字符的function。

我不确定这是不是内置的库，但它看起来像你需要和支持3.1。

来自： http : //docs.python.org/3.1/library/xml.sax.utils.html?highlight=html%20unescape

xml.sax.saxutils.unescape（data，entities = {}）Unescape'＆'，'<'和'>'在一串数据中。

雅各

如何在Python 3.1的string中使用HTML实体？

EF 4.1exception“提供程序没有返回ProviderManifestTokenstring”

为什么我们需要实体对象？

cascade = {“remove”} VS orphanRemoval = true VS ondelete =“CASCADE

如何使用entity framework只更新一个字段？

AsEnumerable（）对LINQ实体有什么影响？

Symfony2实体集合 – 如何添加/删除与现有实体的关联？

“LINQ to Entities”，“LINQ to SQL”和“LINQ to Dataset”有什么区别？

JPA认为我正在删除一个分离的对象

提高entity framework中的批量插入性能

LINQ to Entities不识别方法'Int32 Parse（System.String）'方法，并且此方法不能被转换成存储expression式