通过链接获取网站的标题

请注意Google新闻在每篇文章摘录底部的来源。

卫报 – 美国广播公司新闻 – 路透社 – 彭博

我试图模仿。

例如，在提交URL http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/我想返回The Washington Times

这怎么可能与PHP？

我的答案是扩展@AI W的使用页面标题的答案。下面是完成他所说的代码。

 <?php function get_title($url){ $str = file_get_contents($url); if(strlen($str)>0){ $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title> preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case return $title[1]; } } //Example: echo get_title("http://www.washingtontimes.com/"); ?>

OUTPUT

华盛顿时报 – 政治，突发新闻，美国和世界新闻

正如你所看到的，这不完全是谷歌正在使用，所以这使我相信，他们得到一个URL的主机名称，并将其匹配到自己的列表。

http://www.washingtontimes.com/ =>华盛顿时报

 $doc = new DOMDocument(); @$doc->loadHTMLFile('http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/'); $xpath = new DOMXPath($doc); echo $xpath->query('//title')->item(0)->nodeValue."\n";

输出：

华盛顿时报的债券委员会在testing票上performance不佳

显然你也应该实现基本的error handling。

您可以获取URL的内容，并对title元素的内容进行正则expression式search。

 <?php $urlContents = file_get_contents("http://example.com/"); preg_match("/<title>(.*)<\/title>/i", $urlContents, $matches); print($matches[1] . "\n"); // "Example Web Page" ?>

或者，如果您不想使用正则expression式（以匹配非常接近文档顶部的东西），则可以使用DOMDocument对象：

 <?php $urlContents = file_get_contents("http://example.com/"); $dom = new DOMDocument(); @$dom->loadHTML($urlContents); $title = $dom->getElementsByTagName('title'); print($title->item(0)->nodeValue . "\n"); // "Example Web Page" ?>

我把它留给你决定你最喜欢哪种方法。

使用域主页中的get_meta_tags（），为NYT带来可能需要截断，但可能有用的东西。

 $b = "http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/" ; $url = parse_url( $b ) ; $tags = get_meta_tags( $url['scheme'].'://'.$url['host'] ); var_dump( $tags );

包括“华盛顿时报”对影响我们国家未来的问题提供突发新闻和评论。

PHP手册在cURL上

 <?php $ch = curl_init("http://www.example.com/"); $fp = fopen("example_homepage.txt", "w"); curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec($ch); curl_close($ch); fclose($fp); ?>

Perl正则expression式匹配的PHP手册

 <?php $subject = "abcdef"; $pattern = '/^def/'; preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3); print_r($matches); ?>

把这两个放在一起：

 <?php // create curl resource $ch = curl_init(); // set url curl_setopt($ch, CURLOPT_URL, "example.com"); //return the transfer as a string curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // $output contains the output string $output = curl_exec($ch); $pattern = '/[<]title[>]([^<]*)[<][\/]titl/i'; preg_match($pattern, $output, $matches); print_r($matches); // close curl resource to free up system resources curl_close($ch); ?>

我不能保证这个例子会工作，因为我没有在这里的PHP，但它应该帮助你开始。

如果你愿意为此使用第三方服务，我只需在www.runway7.net/radar上build立一个服务

给你标题，描述等等。例如，试试雷达上的例子。（ http://radar.runway7.net/?url=http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/ ）

或者，您可以使用简单的Html Domparsing器：

 <?php require_once('simple_html_dom.php'); $html = file_get_html('http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/'); echo $html->find('title', 0)->innertext . "<br>\n"; echo $html->find('div[class=entry-content]', 0)->innertext;

我写了一个函数来处理它：

  function getURLTitle($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $content = curl_exec($ch); $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE); $charset = ''; if($contentType && preg_match('/\bcharset=(.+)\b/i', $contentType, $matches)){ $charset = $matches[1]; } curl_close($ch); if(strlen($content) > 0 && preg_match('/\<title\b.*\>(.*)\<\/title\>/i', $content, $matches)){ $title = $matches[1]; if(!$charset && preg_match_all('/\<meta\b.*\>/i', $content, $matches)){ //order: //http header content-type //meta http-equiv content-type //meta charset foreach($matches as $match){ $match = strtolower($match); if(strpos($match, 'content-type') && preg_match('/\bcharset=(.+)\b/', $match, $ms)){ $charset = $ms[1]; break; } } if(!$charset){ //meta charset=utf-8 //meta charset='utf-8' foreach($matches as $match){ $match = strtolower($match); if(preg_match('/\bcharset=([\'"])?(.+)\1?/', $match, $ms)){ $charset = $ms[1]; break; } } } } return $charset ? iconv($charset, 'utf-8', $title) : $title; } return $url; }

它提取网页内容，并尝试通过（（从最高优先级到最低优先级）获取文件字符集编码：

“Content-Type”字段中的HTTP“charset”参数。
将“http-equiv”设置为“Content-Type”的META声明以及为“charset”设置的值。
在指定外部资源的元素上设置的字符集属性。

（见http://www.w3.org/TR/html4/charset.html ）

然后使用iconv将标题转换为utf-8编码。

通过链接获取网站标题并将标题转换为utf-8字符编码：

https://gist.github.com/kisexu/b64bc6ab787f302ae838

 function getTitle($url) { // get html via url $ch = curl_init(); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); $html = curl_exec($ch); curl_close($ch); // get title preg_match('/(?<=<title>).+(?=<\/title>)/iU', $html, $match); $title = empty($match[0]) ? 'Untitled' : $match[0]; $title = trim($title); // convert title to utf-8 character encoding if ($title != 'Untitled') { preg_match('/(?<=charset\=).+(?=\")/iU', $html, $match); if (!empty($match[0])) { $charset = str_replace('"', '', $match[0]); $charset = str_replace("'", '', $charset); $charset = strtolower( trim($charset) ); if ($charset != 'utf-8') { $title = iconv($charset, 'utf-8', $title); } } } return $title; }

我试图避免正则expression式时，我没有必要，我已经做了一个函数，以获得网站标题curl和DOMDocument下面。

 function website_title($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // some websites like Facebook need a user agent to be set. curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36'); $html = curl_exec($ch); curl_close($ch); $dom = new DOMDocument; @$dom->loadHTML($html); $title = $dom->getElementsByTagName('title')->item('0')->nodeValue; return $title; } echo website_title('https://www.facebook.com/');

上面返回以下内容：欢迎来到Facebook – login，注册或了解更多

通过链接获取网站的标题

如何列出我的shell中定义的函数？

感叹号在函数之前做了什么？

函数里面的函数。

testing一个函数的参数是否设置在R中

PHP默认函数参数值，如何为'非最后'参数'传递默认值'？

PHP函数生成v4 UUID

什么是下划线代表在Swift参考？

JavaScript可变参数数量的函数

表名称作为PostgreSQL函数参数

如何使用模板将lambda转换为std :: function