从PHP中的文本中提取url

我有这样的文字：

$string = "this is my friend's website http://example.com I think it is coll";

我怎样才能提取链接到另一个variables？

我知道应该使用正则expression式，特别是preg_match()但我不知道如何？

可能最安全的方式是使用WordPress的代码片段。下载最新的（目前3.1.1），看看wp-includes / formatting.php。有一个名为make_clickable的函数，它具有param的纯文本并返回格式化的string。您可以抓取代码来提取url。虽然这很复杂

这一行正则expression式可能会有所帮助。

 preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match);

但是这个正则expression式仍然不能删除一些格式不正确的URL（例如http://google:ha.ckers.org ）。

另请参阅：如何模仿StackOverflow自动链接行为

我尝试用Nobu说，使用Wordpress，但对其他WordPress函数依赖很多，我select使用Nobu的正则expression式preg_match_all()并将其转换为一个函数，使用preg_replace_callback() ; 现在可以用可点击的链接replace文本中的所有链接。它使用匿名函数，所以你需要PHP 5.3，或者你可以重写代码来使用普通函数。

 <?php /** * Make clickable links from URLs in text. */ function make_clickable($text) { $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#'; return preg_replace_callback($regex, function ($matches) { return "<a href=\'{$matches[0]}\'>{$matches[0]}</a>"; }, $text); }

url有一个相当复杂的定义 – 您必须先决定要捕获的内容。捕获任何以http://和https://开头的简单示例可能是：

 preg_match_all('!https?://\S+!', $string, $matches); $all_urls = $matches[0];

请注意，这是非常基本的，可以捕获无效的URL。我会build议追赶POSIX和PHP正则expression式来处理更复杂的事情。

如果您从中提取url的文本是用户提交的，并且您要将结果作为链接显示在任何位置，则必须非常小心，以避免XSS漏洞，最显着的“javascript：”协议url，而且格式不正确可能会欺骗您的正则expression式和/或显示浏览器将其作为Javascript URL执行的URL。至less，您应该只接受以“http”，“https”或“ftp”开头的url。

Jeff还有一个博客文章，介绍了解压缩URL的其他一些问题。

你可以这样做

 <?php $string = "this is my friend's website http://example.com I think it is coll"; echo explode(' ',strstr($string,'http://'))[0]; //"prints" http://example.com

我工作的代码（特别是如果你有$链接在你的$string）是：

 $string = "this is my friend's website http://example.com I think it is cool, but this is cooler http://www.memelpower.com :)"; $regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i'; preg_match_all($regex, $string, $matches); $urls = $matches[0]; // go over all links foreach($urls as $url) { echo $url.'<br />'; }

希望能帮助别人。

 preg_match_all('/[az]+:\/\/\S+/', $string, $matches);

这是一个简单的方法，适用于很多情况，而不是全部。所有的比赛都放在$匹配。请注意，这不包括锚元素（<a href =“”…）中的链接，但是这也不在您的示例中。

 preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+". "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", $var, &$matches); $matches = $matches[1]; $list = array(); foreach($matches as $var) { print($var."<br>"); }

你可以尝试这个find链接并修改链接（添加href链接）。

 $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/"; // The Text you want to filter for urls $text = "The text you want to filter goes here. http://note.taable.com"; if(preg_match($reg_exUrl, $text, $url)) { echo preg_replace($reg_exUrl, "<a href="{$url[0]}">{$url[0]}</a> ", $text); } else { echo "No url in the text"; }

请参阅这里： http : //php.net/manual/en/function.preg-match.php socialnews

这个正则expression式很适合我，我已经检查过所有types的URL，

 <?php $string = "Thisregexfindurlhttp://www.rubular.com/r/bFHobduQ3n mixedwithstring"; preg_match_all('/(https?|ssh|ftp):\/\/[^\s"]+/', $string, $url); $all_url = $url[0]; // Returns Array Of all Found URL's $one_url = $url[0][0]; // Gives the First URL in Array of URL's ?>

检查了很多url可以在这里findhttp://www.rubular.com/r/bFHobduQ3n

 public function find_links($post_content){ $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/"; // Check if there is a url in the text if(preg_match_all($reg_exUrl, $post_content, $urls)) { // make the urls hyper links, foreach($urls[0] as $url){ $post_content = str_replace($url, '<a href="'.$url.'" rel="nofollow"> LINK </a>', $post_content); } //var_dump($post_content);die(); //uncomment to see result //return text with hyper links return $post_content; } else { // if no urls in the text just return the text return $post_content; } }

从PHP中的文本中提取url

以大写字母拆分string

你如何debugging正则expression式？

RegEx判断一个string是否不包含特定的字符

为什么在Java 8中split有时会在结果数组开始时删除空string？

如何用sedreplace整个行？

我们怎样才能匹配一个Java正则expression式？

在记事本++中删除空行

什么是最终的邮政编码和正则expression式？

捕获Python正则expression式中的重复子模式

编译器在这里做什么：int a = b （c d * + e）？

从PHP中的文本中提取url

以大写字母拆分string

你如何debugging正则expression式？

RegEx判断一个string是否不包含特定的字符

为什么在Java 8中split有时会在结果数组开始时删除空string？

如何用sedreplace整个行？

我们怎样才能匹配一个Java正则expression式？

在记事本++中删除空行

什么是最终的邮政编码和正则expression式？

捕获Python正则expression式中的重复子模式

编译器在这里做什么：int a = b *（c * d * + e）？

编译器在这里做什么：int a = b （c d * + e）？