为什么std :: fstreams这么慢？

我正在做一个简单的parsing器，当分析我观察到的瓶颈是在…文件阅读！我提取了非常简单的testing来比较fstreams和FILE*在读取大量数据时的性能：

 #include <stdio.h> #include <chrono> #include <fstream> #include <iostream> #include <functional> void measure(const std::string& test, std::function<void()> function) { auto start_time = std::chrono::high_resolution_clock::now(); function(); auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time); std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl; } #define BUFFER_SIZE (1024 * 1024 * 1024) int main(int argc, const char * argv[]) { auto buffer = new char[BUFFER_SIZE]; memset(buffer, 123, BUFFER_SIZE); measure("FILE* write", [buffer]() { FILE* file = fopen("test_file_write", "wb"); fwrite(buffer, 1, BUFFER_SIZE, file); fclose(file); }); measure("FILE* read", [buffer]() { FILE* file = fopen("test_file_read", "rb"); fread(buffer, 1, BUFFER_SIZE, file); fclose(file); }); measure("fstream write", [buffer]() { std::ofstream stream("test_stream_write", std::ios::binary); stream.write(buffer, BUFFER_SIZE); }); measure("fstream read", [buffer]() { std::ifstream stream("test_stream_read", std::ios::binary); stream.read(buffer, BUFFER_SIZE); }); delete[] buffer; }

在我的机器上运行这个代码的结果是：

 FILE* write 1388.59 ms FILE* read 1292.51 ms fstream write 3105.38 ms fstream read 3319.82 ms

fstream写入/读取速度比FILE*慢2倍FILE*写入/读取！而在阅读大量数据时，没有任何parsing或fstreams其他function。我在Mac OS上运行代码，Intel I7 2.6GHz，16GB 1600 MHz内存，SSD驱动器。请注意，再次运行相同代码的时间为FILE* read是非常低的（大约200毫秒），可能是因为文件被caching…这就是为什么打开阅读文件不是使用代码创build。

为什么使用fstream读取一个二进制数据blob比FILE*慢？

编辑1：我更新了代码和时间。抱歉耽搁了！

编辑2：我添加了命令行和新的结果（非常类似于以前的！）

 $ clang++ main.cpp -std=c++11 -stdlib=libc++ -O3 $ ./a.out FILE* write 1417.9 ms FILE* read 1292.59 ms fstream write 3214.02 ms fstream read 3052.56 ms

继第二轮结果之后：

 $ ./a.out FILE* write 1428.98 ms FILE* read 196.902 ms fstream write 3343.69 ms fstream read 2285.93 ms

它看起来像读取FILE*和stream时cachingFILE* ，因为时间减less了两个相同的金额。

编辑3：我减less了代码：

 FILE* file = fopen("test_file_write", "wb"); fwrite(buffer, 1, BUFFER_SIZE, file); fclose(file); std::ofstream stream("test_stream_write", std::ios::binary); stream.write(buffer, BUFFER_SIZE);

并启动了剖析器。好像stream在xsputn函数中花了很多时间，而实际的write调用具有相同的持续时间（因为它应该是相同的函数…）

 Running Time Self Symbol Name 3266.0ms 66.9% 0,0 std::__1::basic_ostream<char, std::__1::char_traits<char> >::write(char const*, long) 3265.0ms 66.9% 2145,0 std::__1::basic_streambuf<char, std::__1::char_traits<char> >::xsputn(char const*, long) 1120.0ms 22.9% 7,0 std::__1::basic_filebuf<char, std::__1::char_traits<char> >::overflow(int) 1112.0ms 22.7% 2,0 fwrite 1127.0ms 23.0% 0,0 fwrite

编辑4出于某种原因，这个问题被标记为重复。我想指出，我根本不使用printf ，我只用std::cout来写时间。 read部分使用的文件是write部分的输出，使用不同的名称复制以避免caching

看起来，在Linux上，对于这一大组数据， fwrite的实现效率要高得多，因为它使用write而不是writev 。

我不确定为什么writev比write慢很多，但似乎是差异的地方。至于为什么fstream需要在这种情况下使用这个构造，我完全没有看到真正的理由。

这可以通过使用strace ./a.out （其中a.out是testing该程序的程序）很容易看出。

输出：

fstream的：

 clock_gettime(CLOCK_REALTIME, {1411978373, 114560081}) = 0 open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 writev(3, [{NULL, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824}], 2) = 1073741824 close(3) = 0 clock_gettime(CLOCK_REALTIME, {1411978386, 376353883}) = 0 write(1, "fstream write 13261.8 ms\n", 25fstream write 13261.8 ms) = 25

文件*：

 clock_gettime(CLOCK_REALTIME, {1411978386, 930326134}) = 0 open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 1073741824 clock_gettime(CLOCK_REALTIME, {1411978388, 584197782}) = 0 write(1, "FILE* write 1653.87 ms\n", 23FILE* write 1653.87 ms) = 23

我没有他们喜欢的SSD硬盘，所以我的机器会慢一些 – 或者其他的东西比较慢。

正如Jan Hudec所指出的，我误解了结果。我只是写了这个：

 #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/uio.h> #include <unistd.h> #include <iostream> #include <cstdlib> #include <cstring> #include <functional> #include <chrono> void measure(const std::string& test, std::function<void()> function) { auto start_time = std::chrono::high_resolution_clock::now(); function(); auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time); std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl; } #define BUFFER_SIZE (1024 * 1024 * 1024) int main() { auto buffer = new char[BUFFER_SIZE]; memset(buffer, 0, BUFFER_SIZE); measure("writev", [buffer]() { int fd = open("test", O_CREAT|O_WRONLY); struct iovec vec[] = { { NULL, 0 }, { (void *)buffer, BUFFER_SIZE } }; writev(fd, vec, sizeof(vec)/sizeof(vec[0])); close(fd); }); measure("write", [buffer]() { int fd = open("test", O_CREAT|O_WRONLY); write(fd, buffer, BUFFER_SIZE); close(fd); }); }

这是真正的fstream实现，做一些傻事 – 可能复制整个数据在一些小块，某处，或某种方式，或类似的东西。我会试着进一步了解。

对于这两种情况，结果几乎相同，并且比问题中的fstream和FILE*变体都快。

编辑：

在我的机器上，现在看来，如果在写入之后添加fclose(file) ，那么对于fstream和FILE* ，在我的系统上花费大约相同的时间，在大约13秒的时间内写入1GB，旧式旋转磁盘驱动器，而不是SSD。

然而，我可以使用这个代码更快地写入：

 #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/uio.h> #include <unistd.h> #include <iostream> #include <cstdlib> #include <cstring> #include <functional> #include <chrono> void measure(const std::string& test, std::function<void()> function) { auto start_time = std::chrono::high_resolution_clock::now(); function(); auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time); std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl; } #define BUFFER_SIZE (1024 * 1024 * 1024) int main() { auto buffer = new char[BUFFER_SIZE]; memset(buffer, 0, BUFFER_SIZE); measure("writev", [buffer]() { int fd = open("test", O_CREAT|O_WRONLY, 0660); struct iovec vec[] = { { NULL, 0 }, { (void *)buffer, BUFFER_SIZE } }; writev(fd, vec, sizeof(vec)/sizeof(vec[0])); close(fd); }); measure("write", [buffer]() { int fd = open("test", O_CREAT|O_WRONLY, 0660); write(fd, buffer, BUFFER_SIZE); close(fd); }); }

给出约650-900毫秒的时间。

我也可以编辑原始程序给予fwrite大约1000ms的时间 – 只需删除fclose 。

我也加了这个方法：

 measure("fstream write (new)", [buffer]() { std::ofstream* stream = new std::ofstream("test", std::ios::binary); stream->write(buffer, BUFFER_SIZE); // Intentionally no delete. });

然后在这里也需要大约1000毫秒。

所以，我的结论是，不知何故，有时候，closures文件会刷新到磁盘。在其他情况下，它不会。我还是不明白为什么…

与其他答案相反，大文件读取的一个大问题来自C标准库的缓冲。尝试在大块（1024KB）中使用低级read / write调用并查看性能跳跃。

C库的文件缓冲对读取或写入小块数据（小于磁盘块大小）非常有用。

在Windows上，当读取和写入原始videostream时，我几乎获得了3倍的性能提升，并且丢失了文件缓冲。

我也使用本地操作系统（win32）API调用打开文件，并告诉操作系统不要caching文件，因为这涉及到另一个副本。

MAC在某种程度上被破坏，旧的实现或设置。

旧的安装程序可能会导致文件被写入exe目录和用户目录中的stream，这应该没有什么区别，除非你有2个磁盘或其他不同的设置。

在我糟糕的Vista中，我得到正常缓冲+未caching：
C ++ 201103
文件*写入4756毫秒
文件*读取5007毫秒
fstream写5526毫秒
fstream读取5728毫秒

正常缓冲区+caching：
C ++ 201103
文件*写4747毫秒
文件*读取454毫秒
fstream写入5490毫秒
fstream读取396毫秒

大缓冲区+caching：
C ++ 201103
第五轮：
文件*写入4760毫秒
文件*读取446毫秒
fstream写5278毫秒
fstream读取369毫秒

这表明文件写入比fstream快，但读取速度比fstream慢，但是所有的数字都在10％以内。

尝试添加更多的缓冲到您的stream，看看是否有帮助。

 const int MySize = 1024*1024; char MrBuf[MySize]; stream.rdbuf()->pubsetbuf(MrBuf, MySize);

FILE的等价物是

 const int MySize = 1024*1024; if (!setvbuf ( file , NULL , _IOFBF , MySize )) DieInDisgrace();

TL; DR：在写之前尝试将其添加到您的代码中：

 const size_t bufsize = 256*1024; char buf[bufsize]; mystream.rdbuf()->pubsetbuf(buf, bufsize);

使用fstream处理大文件时，请确保使用stream缓冲区 。

反直觉地，禁用stream缓冲会大大降低性能。至lessMSVC实现在没有设置缓冲区的情况下，一次将1个字符复制到filebuf （请参阅streambuf::xsputn() ），这可以使您的应用程序受CPU限制，这将导致较低的I / O速率。

注意：您可以在这里find完整的示例应用程序。

为什么std :: fstreams这么慢？

String.Join与StringBuilder：哪个更快？

实际表与比较 Div表

为什么在将指针指向bool时有性能警告？

加速CakePHP

Java中的数组或列表。哪个更快？

当不抛出exception时，是否尝试/捕获块会损害性能？

numpy float：在算术运算中比内build速度慢10倍？

如何在O（n）中find长度为n的未sorting数组中的第k个最大元素？

JavaScriptvariables声明外部或内部循环？

我如何衡量AngularJS应用程序摘要循环的性能？

为什么std :: fstreams这么慢？

String.Join与StringBuilder：哪个更快？

实际表与比较 Div表

为什么在将指针指向bool时有性能警告？

加速CakePHP

Java中的数组或列表。 哪个更快？

当不抛出exception时，是否尝试/捕获块会损害性能？

numpy float：在算术运算中比内build速度慢10倍？

如何在O（n）中find长度为n的未sorting数组中的第k个最大元素？

JavaScriptvariables声明外部或内部循环？

我如何衡量AngularJS应用程序摘要循环的性能？

Java中的数组或列表。哪个更快？