我如何检索使用Python的网页的页面标题？

如何使用Python检索网页的页面标题（title html标签）？

我会一直使用lxml来完成这些任务。你也可以使用美丽的。

import lxml.html t = lxml.html.parse(url) print t.find(".//title").text

这是@Vinko Vrsalovic的答案的简化版本：

 import urllib2 from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen("https://www.google.com")) print soup.title.string

注意：

soup.title在html文档的任何位置find第一个标题元素
title.string假定它只有一个子节点，并且该子节点是一个string

对于beautifulsoup 4.x ，使用不同的导入：

 from bs4 import BeautifulSoup

机械化浏览器对象有一个title（）方法。所以这篇文章的代码可以改写为：

 from mechanize import Browser br = Browser() br.open("http://www.google.com/") print br.title()

这对于这样一个简单的任务来说可能是过度的，但是如果你打算做的不仅仅是这些，那么从这些工具开始（机械化，BeautifulSoup）更好，因为它们比其他的更容易使用（urllib获取内容和regexen或者其他一些parsing器来parsinghtml）

链接： BeautifulSoup 机械化

 #!/usr/bin/env python #coding:utf-8 from BeautifulSoup import BeautifulSoup from mechanize import Browser #This retrieves the webpage content br = Browser() res = br.open("https://www.google.com/") data = res.get_data() #This parses the content soup = BeautifulSoup(data) title = soup.find('title') #This outputs the content :) print title.renderContents()

使用HTMLParser ：

 from urllib.request import urlopen from html.parser import HTMLParser class TitleParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.match = False self.title = '' def handle_starttag(self, tag, attributes): self.match = True if tag == 'title' else False def handle_data(self, data): if self.match: self.title = data self.match = False url = "http://example.com/" html_string = str(urlopen(url).read()) parser = TitleParser() parser.feed(html_string) print(parser.title) # prints: Example Domain

无需导入其他库。请求具有内置的此function。

 >> hearders = {'headers':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0'} >>> n = requests.get('http://www.imdb.com/title/tt0108778/', headers=hearders) >>> al = n.text >>> al[al.find('<title>') + 7 : al.find('</title>')] u'Friends (TV Series 1994\u20132004) - IMDb'

使用正则expression式

 import re match = re.search('<title>(.*?)</title>', raw_html) title = match.group(1) if match else 'No title'

soup.title.string实际上返回一个Unicodestring。要将其转换为普通string，您需要执行string=string.encode('ascii','ignore')

我如何检索使用Python的网页的页面标题？

如何dynamic使用jQuery设置宽度和高度

如何正确closures<img>标签？

使用jQuery按比例缩放背景的元素

Keygen标签在HTML5中

我可以将CSS样式应用于元素名称吗？

iPhone浏览器默认为大写字母的密码字段的第一个字母

删除Microsoft Edge的电话号码样式

在另一个js文件中调用一个javascript函数

引导3.0：如何在同一行上有文本和input？

我应该在哪里声明我的页面中使用的JavaScript文件？在<head> </ head>或附近</ body>？

我如何检索使用Python的网页的页面标题？

如何dynamic使用jQuery设置宽度和高度

如何正确closures<img>标签？

使用jQuery按比例缩放背景的元素

Keygen标签在HTML5中

我可以将CSS样式应用于元素名称吗？

iPhone浏览器默认为大写字母的密码字段的第一个字母

删除Microsoft Edge的电话号码样式

在另一个js文件中调用一个javascript函数

引导3.0：如何在同一行上有文本和input？

我应该在哪里声明我的页面中使用的JavaScript文件？ 在<head> </ head>或附近</ body>？

我应该在哪里声明我的页面中使用的JavaScript文件？在<head> </ head>或附近</ body>？