如何删除html特殊字符？

我正在为我的应用程序创build一个RSS提要文件，在该提要文件中，我想删除由strip_tags完成的HTML标记。但是strip_tags不能删除HTML特殊的代码字符：

 &nbsp; &amp; &copy;

等等

请告诉我任何函数，我可以用来从我的string中删除这些特殊的代码字符。

要么使用html_entity_decode解码，要么使用preg_replace删除它们：

 $Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content);

（从这里）

编辑：替代根据Jacco的评论

可能很好用{2,8}replace“+”之类的东西。这将限制当未编码的“＆”出现时replace整个句子的机会。

 $Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content);

使用html_entity_decode转换HTML实体。

你需要设置字符集，使其正确工作。

除了上面的好的答案之外，PHP还有一个非常有用的内置filter函数：filter-var。

要删除HMTL字符，请使用：

$cleanString = filter_var($dirtyString, FILTER_SANITIZE_STRING);

更多信息：

function.filter-VAR
FILTER_SANITIZE_STRING

你可能想在这里看看htmlentities（）和html_entity_decode（）

 $orig = "I'll \"walk\" the <b>dog</b> now"; $a = htmlentities($orig); $b = html_entity_decode($a); echo $a; // I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now echo $b; // I'll "walk" the <b>dog</b> now

这可能会很好地移除特殊字符。

 $modifiedString = preg_replace("/[^a-zA-Z0-9_.-\s]/", "", $content);

我所做的就是使用： html_entity_decode ，然后使用strip_tags去除它们。

尝试这个

 <?php $str = "\x8F!!!"; // Outputs an empty string echo htmlentities($str, ENT_QUOTES, "UTF-8"); // Outputs "!!!" echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8"); ?>

一个普通的香草串的方式来做到这一点，而无需使用preg正则expression式引擎：

 function remEntities($str) { if(substr_count($str, '&') && substr_count($str, ';')) { // Find amper $amp_pos = strpos($str, '&'); //Find the ; $semi_pos = strpos($str, ';'); // Only if the ; is after the & if($semi_pos > $amp_pos) { //is a HTML entity, try to remove $tmp = substr($str, 0, $amp_pos); $tmp = $tmp. substr($str, $semi_pos + 1, strlen($str)); $str = $tmp; //Has another entity in it? if(substr_count($str, '&') && substr_count($str, ';')) $str = remEntities($tmp); } } return $str; }

它看起来像你真正想要的是：

 function xmlEntities($string) { $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES); foreach ($translationTable as $char => $entity) { $from[] = $entity; $to[] = '&#'.ord($char).';'; } return str_replace($from, $to, $string); }

它用等同的数字来replace命名实体。

 <?php function strip_only($str, $tags, $stripContent = false) { $content = ''; if(!is_array($tags)) { $tags = (strpos($str, '>') !== false ? explode('>', str_replace('<', '', $tags)) : array($tags)); if(end($tags) == '') array_pop($tags); } foreach($tags as $tag) { if ($stripContent) $content = '(.+</'.$tag.'[^>]*>|)'; $str = preg_replace('#</?'.$tag.'[^>]*>'.$content.'#is', '', $str); } return $str; } $str = '<font color="red">red</font> text'; $tags = 'font'; $a = strip_only($str, $tags); // red text $b = strip_only($str, $tags, true); // text ?>

我用来执行任务的function，join了schnaader进行的升级：

  mysql_real_escape_string( preg_replace_callback("/&#?[a-z0-9]+;/i", function($m) { return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); }, strip_tags($row['cuerpo'])))

这个函数删除所有的html标签和html符号，转换成UTF-8准备保存在MySQL中

你可以试试htmlspecialchars_decode($string) 。它适用于我。

http://www.w3schools.com/php/func_string_htmlspecialchars_decode.asp

 $string = "äáčé"; $convert = Array( 'ä'=>'a', 'Ä'=>'A', 'á'=>'a', 'Á'=>'A', 'à'=>'a', 'À'=>'A', 'ã'=>'a', 'Ã'=>'A', 'â'=>'a', 'Â'=>'A', 'č'=>'c', 'Č'=>'C', 'ć'=>'c', 'Ć'=>'C', 'ď'=>'d', 'Ď'=>'D', 'ě'=>'e', 'Ě'=>'E', 'é'=>'e', 'É'=>'E', 'ë'=>'e', ); $string = strtr($string , $convert ); echo $string; //aace

如何删除html特殊字符？

HTML：是否有可能在XHTML有效的方式在每个表行中有一个FORM标记？

CSS在每个孩子之后清除

使用CSS更改hover上的兄弟元素的颜色

应该在img标签高度/宽度属性或CSS中定义图像大小？

从Vimeo获取img缩略图？

检查与jquery，如果div有溢出元素

什么是x-tmpl？

如何在“–allow-file-access-from-files”模式下使用Chrome启动HTML？

元刷新不起作用？

停止LastPass填写表格