如果redirect发生,如何获得file_get_contents之后的真实URL?
我正在使用file_get_contents()
从网站抓取内容,而且即使我作为parameter passing的URLredirect到另一个URL,也是非常有效的。
问题是我需要知道新的URL,有没有办法做到这一点?
您可以用cURL而不是file_get_contents()
来发出请求。
像这样的东西应该工作…
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, TRUE); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); $a = curl_exec($ch); if(preg_match('#Location: (.*)#', $a, $r)) $l = trim($r[1]);
资源
如果您需要使用file_get_contents()
而不是curl,请不要自动遵循redirect:
$context = stream_context_create( array( 'http' => array( 'follow_location' => false ) ) ); $html = file_get_contents('http://www.example.com/', false, $context); var_dump($http_response_header);
回答灵感来自: 在PHP中如何忽略file_get_contents的移动头?
一切function:
function get_web_page( $url ) { $res = array(); $options = array( CURLOPT_RETURNTRANSFER => true, // return web page CURLOPT_HEADER => false, // do not return headers CURLOPT_FOLLOWLOCATION => true, // follow redirects CURLOPT_USERAGENT => "spider", // who am i CURLOPT_AUTOREFERER => true, // set referer on redirect CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect CURLOPT_TIMEOUT => 120, // timeout on response CURLOPT_MAXREDIRS => 10, // stop after 10 redirects ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); $res['content'] = $content; $res['url'] = $header['url']; return $res; } print_r(get_web_page("http://www.example.com/redirectfrom"));
一个完整的解决scheme,使用裸file_get_contents
(注意input$url
参数):
function get_url_contents_and_final_url(&$url) { do { $context = stream_context_create( array( "http" => array( "follow_location" => false, ), ) ); $result = file_get_contents($url, false, $context); $pattern = "/^Location:\s*(.*)$/i"; $location_headers = preg_grep($pattern, $http_response_header); if (!empty($location_headers) && preg_match($pattern, array_values($location_headers)[0], $matches)) { $url = $matches[1]; $repeat = true; } else { $repeat = false; } } while ($repeat); return $result; }