从URL获取HTTP响应代码的最佳方式是什么?
我正在寻找一种快速的方式来从一个URL(即200,404等)获取HTTP响应代码。 我不确定使用哪个库。
这是一个使用httplib
的解决scheme。
import httplib def get_status_code(host, path="/"): """ This function retreives the status code of a website by requesting HEAD data from the host. This means that it only requests the headers. If the host cannot be reached or something else goes wrong, it returns None instead. """ try: conn = httplib.HTTPConnection(host) conn.request("HEAD", path) return conn.getresponse().status except StandardError: return None print get_status_code("stackoverflow.com") # prints 200 print get_status_code("stackoverflow.com", "/nonexistant") # prints 404
更新使用美妙的请求库 。 请注意,我们正在使用HEAD请求,这应该更快,然后完整的GET或POST请求。
import requests try: r = requests.head("http://stackoverflow.com") print(r.status_code) # prints the int of the status code. Find more at httpstatusrappers.com :) except requests.ConnectionError: print("failed to connect")
你应该使用urllib2,像这样:
import urllib2 for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]: try: connection = urllib2.urlopen(url) print connection.getcode() connection.close() except urllib2.HTTPError, e: print e.getcode() # Prints: # 200 [from the try block] # 404 [from the except block]
将来,对于那些使用python3和更高版本的用户来说,这里有另一个代码来查找响应代码。
import urllib.request def getResponseCode(url): conn = urllib.request.urlopen(url) return conn.getcode()
getcode()
exception不包含getcode()
方法。 改用code
属性。
这里有一个和urllib2类似的httplib
解决scheme。 你可以给它一个url,它只是工作。 不需要把你的URL分成主机名和path。 这个function已经做到了。
import httplib import socket def get_link_status(url): """ Gets the HTTP status of the url or returns an error associated with it. Always returns a string. """ https=False url=re.sub(r'(.*)#.*$',r'\1',url) url=url.split('/',3) if len(url) > 3: path='/'+url[3] else: path='/' if url[0] == 'http:': port=80 elif url[0] == 'https:': port=443 https=True if ':' in url[2]: host=url[2].split(':')[0] port=url[2].split(':')[1] else: host=url[2] try: headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0', 'Host':host } if https: conn=httplib.HTTPSConnection(host=host,port=port,timeout=10) else: conn=httplib.HTTPConnection(host=host,port=port,timeout=10) conn.request(method="HEAD",url=path,headers=headers) response=str(conn.getresponse().status) conn.close() except socket.gaierror,e: response="Socket Error (%d): %s" % (e[0],e[1]) except StandardError,e: if hasattr(e,'getcode') and len(e.getcode()) > 0: response=str(e.getcode()) if hasattr(e, 'message') and len(e.message) > 0: response=str(e.message) elif hasattr(e, 'msg') and len(e.msg) > 0: response=str(e.msg) elif type('') == type(e): response=e else: response="Exception occurred without a good error message. Manually check the URL to see the status. If it is believed this URL is 100% good then file a issue for a potential bug." return response