在性能方面,使用BufferedOutputStream包装FileOutputStream的意义何在?
我有一个模块负责读取,处理和写入字节到磁盘。 这些字节通过UDP进入,并且在各个数据报被组装之后,被处理和写入磁盘的最终字节数组通常在200字节和500,000字节之间。 偶尔会有字节数组,在组装后,超过500,000字节,但是这些是比较less见的。
我正在使用FileOutputStream
的write(byte\[\])
方法 。 我也在用BufferedOutputStream
封装FileOutputStream
,包括使用接受缓冲区大小的构造函数作为参数 。
看起来,使用BufferedOutputStream
的趋势往往略好,但我只是开始尝试不同的缓冲区大小。 我只有一个有限的样本数据集(来自样本运行的两个数据集,我可以通过我的应用程序)。 是否有一个一般的经验法则,我可以申请试图计算最佳的缓冲区大小,以减less磁盘写入,并最大限度地提高了磁盘写入的性能给定的信息,我知道我正在写的数据?
BufferedOutputStream有助于当写入小于缓冲区大小,例如8 KB。 对于较大的写入来说,这并没有帮助,也不会使它变得更糟。 如果所有的写操作都大于缓冲区大小,或者每次写操作后总是flush(),那么我不会使用缓冲区。 但是,如果你写的很多部分less于缓冲区大小,并且你不用每次都使用flush(),那么值得拥有。
您可能会发现将缓冲区大小增加到32 KB或更大可能会使您的边缘得到改善,或使情况变得更糟。 因人而异
你可能会发现BufferedOutputStream.write的代码有用
/** * Writes <code>len</code> bytes from the specified byte array * starting at offset <code>off</code> to this buffered output stream. * * <p> Ordinarily this method stores bytes from the given array into this * stream's buffer, flushing the buffer to the underlying output stream as * needed. If the requested length is at least as large as this stream's * buffer, however, then this method will flush the buffer and write the * bytes directly to the underlying output stream. Thus redundant * <code>BufferedOutputStream</code>s will not copy data unnecessarily. * * @param b the data. * @param off the start offset in the data. * @param len the number of bytes to write. * @exception IOException if an I/O error occurs. */ public synchronized void write(byte b[], int off, int len) throws IOException { if (len >= buf.length) { /* If the request length exceeds the size of the output buffer, flush the output buffer and then write the data directly. In this way buffered streams will cascade harmlessly. */ flushBuffer(); out.write(b, off, len); return; } if (len > buf.length - count) { flushBuffer(); } System.arraycopy(b, off, buf, count, len); count += len; }
我最近一直在试图探索IO性能。 从我所观察到的,直接写入FileOutputStream
导致更好的结果; 我已经归因于FileOutputStream
本地调用的write(byte[], int, int)
。 此外,我还观察到,当BufferedOutputStream
的延迟开始趋于直接FileOutputStream
,它会波动很多,即它可以突然甚至加倍(我还没有find原因)。
PS我正在使用Java 8,现在将不能评论我的观察是否适用于以前的Java版本。
这是我testing的代码,其中我的input是一个〜10KB的文件
public class WriteCombinationsOutputStreamComparison { private static final Logger LOG = LogManager.getLogger(WriteCombinationsOutputStreamComparison.class); public static void main(String[] args) throws IOException { final BufferedInputStream input = new BufferedInputStream(new FileInputStream("src/main/resources/inputStream1.txt"), 4*1024); final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); int data = input.read(); while (data != -1) { byteArrayOutputStream.write(data); // everything comes in memory data = input.read(); } final byte[] bytesRead = byteArrayOutputStream.toByteArray(); input.close(); /* * 1. WRITE USING A STREAM DIRECTLY with entire byte array --> FileOutputStream directly uses a native call and writes */ try (OutputStream outputStream = new FileOutputStream("src/main/resources/outputStream1.txt")) { final long begin = System.nanoTime(); outputStream.write(bytesRead); outputStream.flush(); final long end = System.nanoTime(); LOG.info("Total time taken for file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]"); if (LOG.isDebugEnabled()) { LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8"))); } } /* * 2. WRITE USING A BUFFERED STREAM, write entire array */ // changed the buffer size to different combinations --> write latency fluctuates a lot for same buffer size over multiple runs try (BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream("src/main/resources/outputStream1.txt"), 16*1024)) { final long begin = System.nanoTime(); outputStream.write(bytesRead); outputStream.flush(); final long end = System.nanoTime(); LOG.info("Total time taken for buffered file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]"); if (LOG.isDebugEnabled()) { LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8"))); } } } }
OUTPUT:
2017-01-30 23:38:59.064 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for file write, writing entire array [nanos=100990], [bytesWritten=11059] 2017-01-30 23:38:59.086 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for buffered file write, writing entire array [nanos=142454], [bytesWritten=11059]