在PHP中检测文件编码

我有一个脚本，它将多个文件合并为一个文件，当其中一个文件具有UTF8编码时，它会中断。我想我应该在读取文件时使用utf8_decode()函数，但是我不知道如何判断需要解码。

我的代码基本上是：

 $output = ''; foreach ($files as $filename) { $output .= file_get_contents($filename) . "\n"; } file_put_contents('combined.txt', $output);

目前，在UTF8文件的开始处，它将这些字符添加到输出中： ï»¿

尝试使用mb_detect_encoding函数。这个函数将检查你的string，并试图“猜测”它的编码是什么。然后您可以根据需要转换它。但是， brulakbuild议，你可能最好转换为 UTF-8，而不是从保存你传输的数据。

为了确保输出是UTF-8，不pipe是什么样的input，我都使用这个检查：

 if(!mb_check_encoding($output, 'UTF-8') OR !($output === mb_convert_encoding(mb_convert_encoding($output, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) { $output = mb_convert_encoding($content, 'UTF-8', 'pass'); } // $output is now safely converted to UTF-8!

mb_detect_encoding函数应该是你的最后select。这可能会返回错误的编码。 Linux命令file -i /path/myfile.txt工作得很好。在PHP中，您可以使用：

 function _detectFileEncoding($filepath) { // VALIDATE $filepath !!! $output = array(); exec('file -i ' . $filepath, $output); if (isset($output[0])){ $ex = explode('charset=', $output[0]); return isset($ex[1]) ? $ex[1] : null; } return null; }

你将如何处理UTF-8或16或32文件中的非ASCII字符？

我问，因为我认为你可能在这里有一个devise问题。

我会将您的输出文件转换为UTF-8（或16或32）而不是其他方式。

那么你不会有这个问题。

你是否也考虑过转换转义UTF8代码可能产生的安全问题？看到这个评论：

检测多字节编码

找出你的源文件是什么编码，然后将其转换为UTF8，你应该很好去。

这是我的解决scheme，像一个魅力工作：

 //check string strict for encoding out of list of supported encodings $enc = mb_detect_encoding($str, mb_list_encodings(), true); if ($enc===false){ //could not detect encoding } else if ($enc!=="UTF-8"){ $str = mb_convert_encoding($str, "UTF-8", $enc); } else { //UTF-8 detected }

对于linux服务器，我使用这个命令：

 $file = 'your/file.ext' exec( "from=`file -bi $file | awk -F'=' '{print $2 }'` && iconv -f \$from -t utf-8 $file -o $file" );

我最近遇到了这个问题， mb_convert_encoding()函数的输出是UTF-8 。看了一下响应头，没有提及编码types，所以我发现设置http头为UTF-8的PHP ，其中提出了以下内容：

 <?php header('Content-Type: text/html; charset=utf-8');

添加到PHP文件的顶部之后，所有时髦的angular色都离开了，它应该像它应该。不知道是否这是原来的海报正在寻求的问题，但我发现这一点，试图自己解决这个问题，并认为我会分享。

扫描所有文件，从mb_list_encodings中find任何一种编码，性能良好..

  function detectFileEncoding($filePath){ $fopen=fopen($filePath,'r'); $row = fgets($fopen); $encodings = mb_list_encodings(); $encoding = mb_detect_encoding( $row, "UTF-8, ASCII, Windows-1252, Windows-1254" );//these are my favorite encodings if($encoding !== false) { $key = array_search($encoding, $encodings) !== false; if ($key !== false) unset($encodings[$key]); $encodings = array_values($encodings); } $encKey = 0; while ($row = fgets($fopen)) { if($encoding == false){ $encoding = $encodings[$encKey++]; } if(!mb_check_encoding($row, $encoding)){ $encoding =false; rewind($fopen); } } return $encoding; }

在PHP中检测文件编码

C / C ++为什么要使用二进制数据的无符号字符？

如何检测Latin1编码列中的UTF-8字符 – MySQL

Python：从ISO-8859-1 / latin1转换为UTF-8

设置默认的Java字符编码？

你如何在Bash中回显一个4位的Unicode字符？

检测编码，并使所有的UTF-8

什么是字符编码，为什么我应该打扰它

如何在Java中转换ISO-8859-1和UTF-8？

HMAC-SHA256签名计算algorithm

jQuery的AJAX字符编码