以二进制模式写入utf16文件

我试图写一个wstring与二进制模式ofstream文件，但我认为我做错了什么。这是我试过的：

ofstream outFile("test.txt", std::ios::out | std::ios::binary); wstring hello = L"hello"; outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t)); outFile.close();

打开test.txt例如火狐编码设置为UTF16它将显示为：

你好

谁能告诉我为什么会发生这种情况？

编辑：

在hex编辑器中打开文件我得到：

 FF FE 68 00 00 00 65 00 00 00 6C 00 00 00 6C 00 00 00 6F 00 00 00

看起来我有两个额外的字节在每个字符之间出于某种原因？

我怀疑在你的环境中sizeof（wchar_t）是4，也就是说，它写出了UTF-32 / UCS-4而不是UTF-16。这当然是hex转储的样子。

这很容易testing（只是打印出sizeof（wchar_t）），但我很确定这是怎么回事。

从UTF-32string到UTF-16，您需要应用正确的编码，因为代理对会发挥作用。

在这里，我们遇到了less量使用的语言环境属性。如果将string输出为string（而不是原始数据），则可以使语言环境自动进行相应的转换。

NB此代码不考虑wchar_t字符的字符。

 #include <locale> #include <fstream> #include <iostream> // See Below for the facet #include "UTF16Facet.h" int main(int argc,char* argv[]) { // construct a custom unicode facet and add it to a local. UTF16Facet *unicodeFacet = new UTF16Facet(); const std::locale unicodeLocale(std::cout.getloc(), unicodeFacet); // Create a stream and imbue it with the facet std::wofstream saveFile; saveFile.imbue(unicodeLocale); // Now the stream is imbued we can open it. // NB If you open the file stream first. Any attempt to imbue it with a local will silently fail. saveFile.open("output.uni"); saveFile << L"This is my Data\n"; return(0); }

文件：UTF16Facet.h

  #include <locale> class UTF16Facet: public std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> { typedef std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> MyType; typedef MyType::state_type state_type; typedef MyType::result result; /* This function deals with converting data from the input stream into the internal stream.*/ /* * from, from_end: Points to the beginning and end of the input that we are converting 'from'. * to, to_limit: Points to where we are writing the conversion 'to' * from_next: When the function exits this should have been updated to point at the next location * to read from. (ie the first unconverted input character) * to_next: When the function exits this should have been updated to point at the next location * to write to. * * status: This indicates the status of the conversion. * possible values are: * error: An error occurred the bad file bit will be set. * ok: Everything went to plan * partial: Not enough input data was supplied to complete any conversion. * nonconv: no conversion was done. */ virtual result do_in(state_type &s, const char *from,const char *from_end,const char* &from_next, wchar_t *to, wchar_t *to_limit,wchar_t* &to_next) const { // Loop over both the input and output array/ for(;(from < from_end) && (to < to_limit);from += 2,++to) { /*Input the Data*/ /* As the input 16 bits may not fill the wchar_t object * Initialise it so that zero out all its bit's. This * is important on systems with 32bit wchar_t objects. */ (*to) = L'\0'; /* Next read the data from the input stream into * wchar_t object. Remember that we need to copy * into the bottom 16 bits no matter what size the * the wchar_t object is. */ reinterpret_cast<char*>(to)[0] = from[0]; reinterpret_cast<char*>(to)[1] = from[1]; } from_next = from; to_next = to; return((from > from_end)?partial:ok); } /* This function deals with converting data from the internal stream to a C/C++ file stream.*/ /* * from, from_end: Points to the beginning and end of the input that we are converting 'from'. * to, to_limit: Points to where we are writing the conversion 'to' * from_next: When the function exits this should have been updated to point at the next location * to read from. (ie the first unconverted input character) * to_next: When the function exits this should have been updated to point at the next location * to write to. * * status: This indicates the status of the conversion. * possible values are: * error: An error occurred the bad file bit will be set. * ok: Everything went to plan * partial: Not enough input data was supplied to complete any conversion. * nonconv: no conversion was done. */ virtual result do_out(state_type &state, const wchar_t *from, const wchar_t *from_end, const wchar_t* &from_next, char *to, char *to_limit, char* &to_next) const { for(;(from < from_end) && (to < to_limit);++from,to += 2) { /* Output the Data */ /* NB I am assuming the characters are encoded as UTF-16. * This means they are 16 bits inside a wchar_t object. * As the size of wchar_t varies between platforms I need * to take this into consideration and only take the bottom * 16 bits of each wchar_t object. */ to[0] = reinterpret_cast<const char*>(from)[0]; to[1] = reinterpret_cast<const char*>(from)[1]; } from_next = from; to_next = to; return((to > to_limit)?partial:ok); } };

如果你使用C++11标准，很容易（因为有很多附加的东西像"utf8" ，它永远解决了这个问题）。

但是，如果您想要使用较老的标准使用多平台代码，则可以使用此方法来写入stream：

阅读有关UTF转换器的文章
从上面的源代码添加stxutif.h到你的项目

以ANSI模式打开文件，并将BOM添加到文件的开头，如下所示：

 std::ofstream fs; fs.open(filepath, std::ios::out|std::ios::binary); unsigned char smarker[3]; smarker[0] = 0xEF; smarker[1] = 0xBB; smarker[2] = 0xBF; fs << smarker; fs.close();

然后以UTF文件格式打开文件并在其中写下你的内容：

 std::wofstream fs; fs.open(filepath, std::ios::out|std::ios::app); std::locale utf8_locale(std::locale(), new utf8cvt<false>); fs.imbue(utf8_locale); fs << .. // Write anything you want...

在使用wofstream和上面定义的utf16 facet的窗口上，由于wofstream将所有字节的值都是0A转换为2个字节0D 0A，所以不pipe你如何通过'\ x0A'，'L'\ x0A' L'\ x000A'，'\ n'，L'\ n'和std :: endl都给出相同的结果。在Windows上，你必须在二进制模式下用一个ofstream（而不是一个wofsteam）打开这个文件，并且像在原始文章中那样写输出。

提供的Utf16Facet没有在大string的gcc工作，这里是为我工作的版本…这样的文件将被保存在UTF-16LE 。对于UTF-16BE ，只需将do_in和do_out的赋值反转to[0] = from[1]和to[1] = from[0]

 #include <locale> #include <bits/codecvt.h> class UTF16Facet: public std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> { typedef std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> MyType; typedef MyType::state_type state_type; typedef MyType::result result; /* This function deals with converting data from the input stream into the internal stream.*/ /* * from, from_end: Points to the beginning and end of the input that we are converting 'from'. * to, to_limit: Points to where we are writing the conversion 'to' * from_next: When the function exits this should have been updated to point at the next location * to read from. (ie the first unconverted input character) * to_next: When the function exits this should have been updated to point at the next location * to write to. * * status: This indicates the status of the conversion. * possible values are: * error: An error occurred the bad file bit will be set. * ok: Everything went to plan * partial: Not enough input data was supplied to complete any conversion. * nonconv: no conversion was done. */ virtual result do_in(state_type &s, const char *from,const char *from_end,const char* &from_next, wchar_t *to, wchar_t *to_limit,wchar_t* &to_next) const { for(;from < from_end;from += 2,++to) { if(to<=to_limit){ (*to) = L'\0'; reinterpret_cast<char*>(to)[0] = from[0]; reinterpret_cast<char*>(to)[1] = from[1]; from_next = from; to_next = to; } } return((to != to_limit)?partial:ok); } /* This function deals with converting data from the internal stream to a C/C++ file stream.*/ /* * from, from_end: Points to the beginning and end of the input that we are converting 'from'. * to, to_limit: Points to where we are writing the conversion 'to' * from_next: When the function exits this should have been updated to point at the next location * to read from. (ie the first unconverted input character) * to_next: When the function exits this should have been updated to point at the next location * to write to. * * status: This indicates the status of the conversion. * possible values are: * error: An error occurred the bad file bit will be set. * ok: Everything went to plan * partial: Not enough input data was supplied to complete any conversion. * nonconv: no conversion was done. */ virtual result do_out(state_type &state, const wchar_t *from, const wchar_t *from_end, const wchar_t* &from_next, char *to, char *to_limit, char* &to_next) const { for(;(from < from_end);++from, to += 2) { if(to <= to_limit){ to[0] = reinterpret_cast<const char*>(from)[0]; to[1] = reinterpret_cast<const char*>(from)[1]; from_next = from; to_next = to; } } return((to != to_limit)?partial:ok); } };

您应该使用hex编辑器（如WinHex）查看输出文件，以便查看实际位和字节，以validation输出实际上是UTF-16。张贴在这里，让我们知道结果。这将告诉我们是否要责怪Firefox或您的C ++程序。

但是在我看来，像你的C ++程序一样，Firefox不能正确地解释你的UTF-16。 UTF-16为每个字符调用两个字节。但是Firefox打印的字符数量应该是它的两倍，所以它可能试图把你的string解释为UTF-8或ASCII，通常每个字符只有1个字节。

当你说“Firefox编码设置为UTF16”你是什么意思？我怀疑这项工作的工作。

以二进制模式写入utf16文件

何时需要“typename”关键字？

创buildWCF ChannelFactory <T>

C＃中纬度/经度值的双精度或小数点

如何退出一个subprocess – _exit（）与退出

逻辑运算符的短路是强制的吗？和评价顺序？

在#definemacros中转义＃符号？

让程序慢慢运行

ServiceStack请求DTOdevise

将string分成长度可变的较小的string

如何使用printf打印非空终止的string？

以二进制模式写入utf16文件

何时需要“typename”关键字？

创buildWCF ChannelFactory <T>

C＃中纬度/经度值的双精度或小数点

如何退出一个subprocess – _exit（）与退出

逻辑运算符的短路是强制的吗？ 和评价顺序？

在#definemacros中转义＃符号？

让程序慢慢运行

ServiceStack请求DTOdevise

将string分成长度可变的较小的string

如何使用printf打印非空终止的string？

逻辑运算符的短路是强制的吗？和评价顺序？