正则expression式的Pythontypes错误

所以，我有这样的代码：

url = 'http://google.com' linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>') m = urllib.request.urlopen(url) msg = m.read() links = linkregex.findall(msg)

但是，然后python返回这个错误：

 links = linkregex.findall(msg) TypeError: can't use a string pattern on a bytes-like object

我做错了什么？

TypeError: can't use a string pattern on a bytes-like object TypeError: can't use a string pattern

我做错了什么？？

你在一个字节对象上使用了一个string模式。改用一个字节模式：

 linkregex = re.compile(b'<a\s*href=[\'|"](.*?)[\'"].*?>') ^ Add the b there, it makes it into a bytes object

（PS：

  >>> from disclaimer include dont_use_regexp_on_html "Use BeautifulSoup or lxml instead."

）

如果你正在运行Python 2.6，那么在“urllib”中没有任何“请求”。所以第三行变成：

 m = urllib.urlopen(url)

而在版本3中，你应该使用这个：

 links = linkregex.findall(str(msg))

因为'msg'是一个字节对象而不是findall（）所期望的string。或者你可以使用正确的编码进行解码。例如，如果“latin1”是编码，那么：

 links = linkregex.findall(msg.decode("latin1"))

那么，我的版本的Python没有urllib与请求属性，但如果我使用“urllib.urlopen（url）”我不收回一个string，我得到一个对象。这是types错误。

您为Google提供的url不适用于我，因此我将http://www.google.com/ig?hl=enreplace为适合我的http://www.google.com/ig?hl=en 。

尝试这个：

 import re import urllib.request url="http://www.google.com/ig?hl=en" linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>') m = urllib.request.urlopen(url) msg = m.read(): links = linkregex.findall(str(msg)) print(links)

希望这可以帮助。

正则expression式模式和string必须是相同的types。如果你匹配一个普通的string，你需要一个string模式。如果你匹配一个字节string，你需要一个字节模式。

在这种情况下， m.read（）返回一个字节string，所以你需要一个字节模式。在Python 3中，常规string是unicodestring，而您需要使用b修饰符来指定string文字：

 linkregex = re.compile(b'<a\s*href=[\'|"](.*?)[\'"].*?>')

这在python3中为我工作。希望这可以帮助

 import urllib.request import re urls = ["https://google.com","https://nytimes.com","http://CNN.com"] i = 0 regex = '<title>(.+?)</title>' pattern = re.compile(regex) while i < len(urls) : htmlfile = urllib.request.urlopen(urls[i]) htmltext = htmlfile.read() titles = re.search(pattern, str(htmltext)) print(titles) i+=1

也正是在这之前，我在正则expression式之前加了b来将它转换成字节数组。

 import urllib.request import re urls = ["https://google.com","https://nytimes.com","http://CNN.com"] i = 0 regex = b'<title>(.+?)</title>' pattern = re.compile(regex) while i < len(urls) : htmlfile = urllib.request.urlopen(urls[i]) htmltext = htmlfile.read() titles = re.search(pattern, htmltext) print(titles) i+=1

正则expression式的Pythontypes错误

在SQL Server中使用RegEx

浮点数的正则expression式

如何计算一列string每行中给定字符的出现次数？

如何忽略正则expression式主题string中的空格？

如何在Java中为正则expression式转义文本

正则expression式在String.matches（）中不起作用

提取R中所有圆括号内的信息

是否有可能使用正则expression式replace数字？

什么是正则expression式平衡组？

为什么一个expression式而不是一个常量，在C for-loop的条件？