如何以有效的方式获取文件中的行数？

我有一个大文件。它包括大约3.000-20.000行。我怎样才能得到使用Java的文件中的行数？

BufferedReader reader = new BufferedReader(new FileReader("file.txt")); int lines = 0; while (reader.readLine() != null) lines++; reader.close();

更新：为了回答这里提出的性能问题，我做了一个测量。第一件事：20000行太less，为了让程序运行一个明显的时间。我创build了一个500万行的文本文件。这个解决scheme（以java开头，没有像-server或-XX-options这样的参数）在我的盒子上需要大约11秒。与wc -l （UNIX命令行工具来统计行数）一样，11秒。阅读每一个字符的解决scheme，寻找'\ n'需要104秒，9-10倍。

使用LineNumberReader

就像是

 public static int countLines(File aFile) throws IOException { LineNumberReader reader = null; try { reader = new LineNumberReader(new FileReader(aFile)); while ((reader.readLine()) != null); return reader.getLineNumber(); } catch (Exception ex) { return -1; } finally { if(reader != null) reader.close(); } }

我find了一些解决scheme，这可能对你有用

以下是代码片段，用于统计文件中的行数。

  File file = new File("/mnt/sdcard/abc.txt"); LineNumberReader lineNumberReader = new LineNumberReader(new FileReader(file)); lineNumberReader.skip(Long.MAX_VALUE); int lines = lineNumberReader.getLineNumber(); lineNumberReader.close();

Java 8+使用NIO有一个非常好的和简短的方法：

 Path path = Paths.get("./big_file.txt"); long lineCount = Files.lines(path).count();

通过读取文件并计算换行符的数量。一次一行读取Java文件的简单方法是java.util.Scanner类。

这是有效的，它可以得到缓冲二进制读取，没有string转换，

 FileInputStream stream = new FileInputStream("/tmp/test.txt"); byte[] buffer = new byte[8192]; int count = 0; int n; while ((n = stream.read(buffer)) > 0) { for (int i = 0; i < n; i++) { if (buffer[i] == '\n') count++; } } stream.close(); System.out.println("Number of lines: " + count);

你需要确切的行数还是只有它的近似值？我碰巧并行处理大文件，通常我不需要知道确切的行数 – 然后我恢复采样。将文件分成10个1MB块，并在每个块中进行计数，然后乘以10，就可以得到相当好的行计数。

所有以前的答案build议阅读整个文件，并计算你在这个过程中发现的换行数量。你说有些人“没有效率”，但那是你能做到的唯一方法。文件中的“行”不是别的。要计算该字符，您必须查看文件中的每个字符。

对不起，你别无select 🙂

如果已经发布的答案不够快，您可能不得不寻找特定于您的特定问题的解决scheme。

例如，如果这些文本文件是只附加到的日志，并且您经常需要知道其中的行数，则可以创build一个索引。该索引将包含文件中的行数，上次修改文件时的文件大小。这将允许您重新计算文件中的行数，方法是跳过您已经看过的所有行，然后只读取新行。

快速和肮脏，但它的工作：

 import java.io.*; public class Counter { public final static void main(String[] args) throws IOException { if (args.length > 0) { File file = new File(args[0]); System.out.println(countLines(file)); } } public final static int countLines(File file) throws IOException { ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath()); Process process = builder.start(); InputStream in = process.getInputStream(); LineNumberReader reader = new LineNumberReader(new InputStreamReader(in)); String line = reader.readLine(); if (line != null) { return Integer.parseInt(line.trim().split(" ")[0]); } else { return -1; } } }

这个解决scheme比在1380万行文件上testing的最高评分答案要快3.6倍。它只是将字节读入缓冲区并计数\n字符。你可以玩缓冲区大小，但是在我的机器上，大于8KB的任何东西都不会使代码更快。

 private int countLines(File file) throws IOException { int lines = 0; FileInputStream fis = new FileInputStream(file); byte[] buffer = new byte[BUFFER_SIZE]; // BUFFER_SIZE = 8 * 1024 int read; while ((read = fis.read(buffer)) != -1) { for (int i = 0; i < read; i++) { if (buffer[i] == '\n') lines++; } } fis.close(); return lines; }

试试unix“wc”命令。我不是指使用它，我的意思是下载源代码，看看他们是如何做到的。这可能是在C，但你可以轻松地将行为移植到Java。制造你自己的问题是考虑结束cr / lf问题。

旧post，但我有一个解决scheme，可能是有用的下一个人。为什么不使用文件长度来知道进展是什么？当然，线条的大小几乎相同，但对于大文件来说效果非常好：

 public static void main(String[] args) throws IOException { File file = new File("yourfilehere"); double fileSize = file.length(); System.out.println("=======> File size = " + fileSize); InputStream inputStream = new FileInputStream(file); InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "iso-8859-1"); BufferedReader bufferedReader = new BufferedReader(inputStreamReader); int totalRead = 0; try { while (bufferedReader.ready()) { String line = bufferedReader.readLine(); // LINE PROCESSING HERE totalRead += line.length() + 1; // we add +1 byte for the newline char. System.out.println("Progress ===> " + ((totalRead / fileSize) * 100) + " %"); } } finally { bufferedReader.close(); } }

它允许在没有对文件进行任何完整的读取的情况下看到进展。我知道这取决于很多的元素，但我希望它会是有用的:)。

[版本]这是一个估计时间的版本。我把一些SYSO来显示进度和估计。我看到你治疗足够的线路后我有一个很好的时间估计错误（我试着用10M线，治疗1％后，时间估计精确到95％）。我知道，一些值必须在variables中设置。这段代码很快写出来，但对我来说已经有用了。希望它也是你的:)。

 long startProcessLine = System.currentTimeMillis(); int totalRead = 0; long progressTime = 0; double percent = 0; int i = 0; int j = 0; int fullEstimation = 0; try { while (bufferedReader.ready()) { String line = bufferedReader.readLine(); totalRead += line.length() + 1; progressTime = System.currentTimeMillis() - startProcessLine; percent = (double) totalRead / fileSize * 100; if ((percent > 1) && i % 10000 == 0) { int estimation = (int) ((progressTime / percent) * (100 - percent)); fullEstimation += progressTime + estimation; j++; System.out.print("Progress ===> " + percent + " %"); System.out.print(" - current progress : " + (progressTime) + " milliseconds"); System.out.print(" - Will be finished in ===> " + estimation + " milliseconds"); System.out.println(" - estimated full time => " + (progressTime + estimation)); } i++; } } finally { bufferedReader.close(); } System.out.println("Ended in " + (progressTime) + " seconds"); System.out.println("Estimative average ===> " + (fullEstimation / j)); System.out.println("Difference: " + ((((double) 100 / (double) progressTime)) * (progressTime - (fullEstimation / j))) + "%");

如果您认为这是一个好的解决scheme，请随意改进此代码。

逐行读取文件，每行增加一个计数器，直到读完整个文件。

在我的testing中，其他答案在118.5k行文件上花费约150-300ms。以下需要1ms，但只是近似的（报告117k行），并依赖于具有相似大小的每一行。

 private static void countSize(File file) { long fileLength = file.length(); BufferedReader reader = null; try { reader = new BufferedReader(new FileReader(file)); //Skip header as it is of different size reader.readLine(); String text = reader.readLine(); int lineLength = text.length(); long lines = fileLength / lineLength; System.out.println(lines); } catch(IOException e) { e.printStackTrace(); } finally { if(reader != null) { try { reader.close(); } catch(IOException e) { //no-op } } } }

纯Java中最快的解决scheme可能是使用NIO通道将文件作为字节读取到大型ByteBuffer中。然后根据您对文件编码scheme的了解，按照相关的行分隔符约定计算编码的CR和/或NL字节。

最大化吞吐量的关键是：

请确保您以大块阅读文件，
避免将字节从一个缓冲区复制到另一个，
避免将字节复制/转换为字符，以及
避免分配对象来表示文件行。

实际的代码太复杂，我不能即时写。此外，OP并没有要求最快的解决scheme。

~~缓冲的阅读器是矫枉过正的~~

 Reader r = new FileReader("f.txt"); int count = 0; int nextchar = 0; while (nextchar != -1){ nextchar = r.read(); if (nextchar == Character.getNumericValue('\n') ){ count++; } }

我search一个简单的例子创造了一个实际上很差。调用read（）复制单个字符不是最佳的。看这里的例子和测量。

如何以有效的方式获取文件中的行数？

什么是存储上传的图像，SQL数据库或磁盘文件系统的最佳地点？

Python：如何创build一个唯一的文件名？

Learning Express for Node.js

如何获取整个文档的HTML作为一个string？

如何在脚本中findUnix文件的编码

如何使用为Android 5.0（棒棒糖）提供的新的SD卡访问API？

使用蓝牙发送文件OBEX对象推送configuration文件（OPP）

有一个C ++迭代器，可以逐行遍历文件吗？

TypeError：预期一个字符缓冲区对象 – 尝试将整数保存到文本文件

双向同步rsync