获取超过10行的超大文本文件> 10GB

什么是最有效的方式来显示一个非常大的文本文件的最后10行（这个特定的文件超过10GB）。我只是想写一个简单的C＃应用程序，但我不知道如何有效地做到这一点。

读到文件末尾，然后向后查找，直到find10个换行符，然后将各种编码考虑到最后。请务必处理文件中的行数less于十个的情况。下面是一个实现（在C＃中，如你所标记的那样），通用来查找位于以encoding path中的文件中的最后一个numberOfTokens ，其中令牌分隔符由tokenSeparator表示; 结果以stringforms返回（这可以通过返回枚举令牌的IEnumerable<string>来改进）。

 public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) { int sizeOfChar = encoding.GetByteCount("\n"); byte[] buffer = encoding.GetBytes(tokenSeparator); using (FileStream fs = new FileStream(path, FileMode.Open)) { Int64 tokenCount = 0; Int64 endPosition = fs.Length / sizeOfChar; for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) { fs.Seek(-position, SeekOrigin.End); fs.Read(buffer, 0, buffer.Length); if (encoding.GetString(buffer) == tokenSeparator) { tokenCount++; if (tokenCount == numberOfTokens) { byte[] returnBuffer = new byte[fs.Length - fs.Position]; fs.Read(returnBuffer, 0, returnBuffer.Length); return encoding.GetString(returnBuffer); } } } // handle case where number of tokens in file is less than numberOfTokens fs.Seek(0, SeekOrigin.Begin); buffer = new byte[fs.Length]; fs.Read(buffer, 0, buffer.Length); return encoding.GetString(buffer); } }

我可能只是打开它作为一个二进制stream，寻求到最后，然后备份寻找换行符。备份10（或11取决于最后一行）来find你的10行，然后只读到最后，并使用Encoding.GetString你读取到一个string格式。根据需要分割。

尾巴？尾是一个unix命令，将显示文件的最后几行。 Windows 2003 Server资源工具包中有一个Windows版本。

正如其他人所build议的那样，您可以转到文件末尾并有效地向后读取。然而，这有点棘手 – 特别是因为如果你有一个可变长度的编码（如UTF-8），你需要狡猾的确保你得到“整个”字符。

您应该能够使用FileStream.Seek（）移动到文件的末尾，然后向后找，直到find足够的行。

我不确定它的效率如何，但在Windows PowerShell中，获取文件的最后十行很简单

 Get-Content file.txt | Select-Object -last 10

这是unix tail命令所做的。见http://en.wikipedia.org/wiki/Tail_(Unix）;

在互联网上有许多开源的实现，这里是一个win32：尾巴为WIn32

我认为下面的代码将解决编码微妙变化的问题

 StreamReader reader = new StreamReader(@"c:\test.txt"); //pick appropriate Encoding reader.BaseStream.Seek(0, SeekOrigin.End); int count = 0; while ((count < 10) && (reader.BaseStream.Position > 0)) { reader.BaseStream.Position--; int c = reader.BaseStream.ReadByte(); if (reader.BaseStream.Position > 0) reader.BaseStream.Position--; if (c == Convert.ToInt32('\n')) { ++count; } } string str = reader.ReadToEnd(); string[] arr = str.Replace("\r", "").Split('\n'); reader.Close();

您可以使用tail命令的windows版本，只需将其输出到带有>符号的文本文件，或者根据您的需要在屏幕上查看它。

这里是我的版本。 HTH

 using (StreamReader sr = new StreamReader(path)) { sr.BaseStream.Seek(0, SeekOrigin.End); int c; int count = 0; long pos = -1; while(count < 10) { sr.BaseStream.Seek(pos, SeekOrigin.End); c = sr.Read(); sr.DiscardBufferedData(); if(c == Convert.ToInt32('\n')) ++count; --pos; } sr.BaseStream.Seek(pos, SeekOrigin.End); string str = sr.ReadToEnd(); string[] arr = str.Split('\n'); }

如果您使用FileMode.Append打开文件，它会为您find文件的结尾。然后你可以找回你想要的字节数并读取它们。不pipe你做什么，它可能不会很快，因为这是一个非常大的文件。

一个有用的方法是FileInfo.Length 。它以字节为单位给出文件的大小。

你的文件是什么结构？你确定最后10行会接近文件的末尾吗？如果你有一个12行的文本和10GB的文件，那么查看结尾不会那么快。然后再一次，你可能不得不查看整个文件。

如果您确定文件中包含大量短string，则可以查find最后一行，然后再查看，直到您计算出11行结束为止。然后你可以阅读下10行。

我认为其他海报都显示没有真正的捷径。

你可以使用诸如tail（或者powershell）这样的工具，或者你可以编写一些愚蠢的代码来查找文件结尾，然后回顾n个新行。

在networking上有很多尾巴的实现 – 看看源代码，看看他们是如何做到的。尾巴是非常有效的（即使是在非常大的文件），所以他们一定是正确的，当他们写它！

打开文件并开始读取行。读完10行后，打开另一个指针，从文件的前面开始，所以第二个指针跟第一行相差10行。继续阅读，一齐移动两个指针，直到第一个到达文件的末尾。然后使用第二个指针来读取结果。它适用于任何大小的文件，包括空白和短于尾巴的长度。而且很容易调整任何长度的尾巴。当然，缺点是你最终会阅读整个文件，这可能正是你想要避免的。

如果每行有一个格式为偶数的文件（例如daq系统），则只需使用streamreader来获取文件的长度，然后选取其中一行（ readline() ）。

将总长度除以string的长度。现在你有一个通用的长数字来表示文件中的行数。

关键是你使用readline()之前获取您的数据为您的数组或任何。这将确保您将在新行的开始处开始，而不会从前一行获取任何剩余数据。

 StreamReader leader = new StreamReader(GetReadFile); leader.BaseStream.Position = 0; StreamReader follower = new StreamReader(GetReadFile); int count = 0; string tmper = null; while (count <= 12) { tmper = leader.ReadLine(); count++; } long total = follower.BaseStream.Length; // get total length of file long step = tmper.Length; // get length of 1 line long size = total / step; // divide to get number of lines long go = step * (size - 12); // get the bit location long cut = follower.BaseStream.Seek(go, SeekOrigin.Begin); // Go to that location follower.BaseStream.Position = go; string led = null; string[] lead = null ; List<string[]> samples = new List<string[]>(); follower.ReadLine(); while (!follower.EndOfStream) { led = follower.ReadLine(); lead = Tokenize(led); samples.Add(lead); }

以Sisutil的答案为出发点，您可以逐行阅读文件，并将它们加载到Queue<String> 。它从一开始就读取该文件，但它具有不尝试向后读文件的优点。如Jon Skeet指出的那样，如果你有一个像UTF-8这样的可变字符宽度编码的文件，这可能会非常困难。它也没有做任何线路长度的假设。

我testing了一个1.7GB的文件（没有10GB的一个方便），它花了大约14秒。当然，在比较计算机之间的加载和读取时间时，通常需要注意。

 int numberOfLines = 10; string fullFilePath = @"C:\Your\Large\File\BigFile.txt"; var queue = new Queue<string>(numberOfLines); using (FileStream fs = File.Open(fullFilePath, FileMode.Open, FileAccess.Read, FileShare.Read)) using (BufferedStream bs = new BufferedStream(fs)) // May not make much difference. using (StreamReader sr = new StreamReader(bs)) { while (!sr.EndOfStream) { if (queue.Count == numberOfLines) { queue.Dequeue(); } queue.Enqueue(sr.ReadLine()); } } // The queue now has our set of lines. So print to console, save to another file, etc. do { Console.WriteLine(queue.Dequeue()); } while (queue.Count > 0);

我只是有同样的问题，一个巨大的日志文件，应该通过REST接口访问。当然加载到任何内存，并通过http发送完成是没有解决scheme。

正如乔恩指出的，这个解决scheme有一个非常具体的用例。在我的情况下，我知道肯定（和检查），编码是utf-8（与BOM！），因此可以从UTF的所有祝福中获益。这当然不是一个通用的解决scheme。

下面是对我来说非常好而且快速的工作（我忘了closuresstream – 现在修复）：

  private string tail(StreamReader streamReader, long numberOfBytesFromEnd) { Stream stream = streamReader.BaseStream; long length = streamReader.BaseStream.Length; if (length < numberOfBytesFromEnd) numberOfBytesFromEnd = length; stream.Seek(numberOfBytesFromEnd * -1, SeekOrigin.End); int LF = '\n'; int CR = '\r'; bool found = false; while (!found) { int c = stream.ReadByte(); if (c == LF) found = true; } string readToEnd = streamReader.ReadToEnd(); streamReader.Close(); return readToEnd; }

我们首先在BaseStream的附近find一个接近尾声的地方，当我们有正确的stream位置时，用通常的StreamReader读到最后。

这实际上并不允许指定结束行的数量，反正这不是一个好主意，因为这些行可能是任意长的，因此会再次导致性能下降。所以我指定了字节的数量，直到我们到达第一个新行，并且舒适地阅读到最后。从理论上讲，你也可以寻找CarriageReturn，但在我的情况下，这是没有必要的。

如果我们使用这个代码，它不会干扰作者线程：

  FileStream fileStream = new FileStream( filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite); StreamReader streamReader = new StreamReader(fileStream);

如果您需要从文本文件中读取任意数量的行，请使用以下LINQ兼容类。它着重于大文件的性能和支持。您可以读取多行，并调用Reverse（）以获得最后几行的顺序：

用法：

 var reader = new ReverseTextReader(@"C:\Temp\ReverseTest.txt"); while (!reader.EndOfStream) Console.WriteLine(reader.ReadLine());

ReverseTextReader类 ：

 /// <summary> /// Reads a text file backwards, line-by-line. /// </summary> /// <remarks>This class uses file seeking to read a text file of any size in reverse order. This /// is useful for needs such as reading a log file newest-entries first.</remarks> public sealed class ReverseTextReader : IEnumerable<string> { private const int BufferSize = 16384; // The number of bytes read from the uderlying stream. private readonly Stream _stream; // Stores the stream feeding data into this reader private readonly Encoding _encoding; // Stores the encoding used to process the file private byte[] _leftoverBuffer; // Stores the leftover partial line after processing a buffer private readonly Queue<string> _lines; // Stores the lines parsed from the buffer #region Constructors /// <summary> /// Creates a reader for the specified file. /// </summary> /// <param name="filePath"></param> public ReverseTextReader(string filePath) : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), Encoding.Default) { } /// <summary> /// Creates a reader using the specified stream. /// </summary> /// <param name="stream"></param> public ReverseTextReader(Stream stream) : this(stream, Encoding.Default) { } /// <summary> /// Creates a reader using the specified path and encoding. /// </summary> /// <param name="filePath"></param> /// <param name="encoding"></param> public ReverseTextReader(string filePath, Encoding encoding) : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), encoding) { } /// <summary> /// Creates a reader using the specified stream and encoding. /// </summary> /// <param name="stream"></param> /// <param name="encoding"></param> public ReverseTextReader(Stream stream, Encoding encoding) { _stream = stream; _encoding = encoding; _lines = new Queue<string>(128); // The stream needs to support seeking for this to work if(!_stream.CanSeek) throw new InvalidOperationException("The specified stream needs to support seeking to be read backwards."); if (!_stream.CanRead) throw new InvalidOperationException("The specified stream needs to support reading to be read backwards."); // Set the current position to the end of the file _stream.Position = _stream.Length; _leftoverBuffer = new byte[0]; } #endregion #region Overrides /// <summary> /// Reads the next previous line from the underlying stream. /// </summary> /// <returns></returns> public string ReadLine() { // Are there lines left to read? If so, return the next one if (_lines.Count != 0) return _lines.Dequeue(); // Are we at the beginning of the stream? If so, we're done if (_stream.Position == 0) return null; #region Read and Process the Next Chunk // Remember the current position var currentPosition = _stream.Position; var newPosition = currentPosition - BufferSize; // Are we before the beginning of the stream? if (newPosition < 0) newPosition = 0; // Calculate the buffer size to read var count = (int)(currentPosition - newPosition); // Set the new position _stream.Position = newPosition; // Make a new buffer but append the previous leftovers var buffer = new byte[count + _leftoverBuffer.Length]; // Read the next buffer _stream.Read(buffer, 0, count); // Move the position of the stream back _stream.Position = newPosition; // And copy in the leftovers from the last buffer if (_leftoverBuffer.Length != 0) Array.Copy(_leftoverBuffer, 0, buffer, count, _leftoverBuffer.Length); // Look for CrLf delimiters var end = buffer.Length - 1; var start = buffer.Length - 2; // Search backwards for a line feed while (start >= 0) { // Is it a line feed? if (buffer[start] == 10) { // Yes. Extract a line and queue it (but exclude the \r\n) _lines.Enqueue(_encoding.GetString(buffer, start + 1, end - start - 2)); // And reset the end end = start; } // Move to the previous character start--; } // What's left over is a portion of a line. Save it for later. _leftoverBuffer = new byte[end + 1]; Array.Copy(buffer, 0, _leftoverBuffer, 0, end + 1); // Are we at the beginning of the stream? if (_stream.Position == 0) // Yes. Add the last line. _lines.Enqueue(_encoding.GetString(_leftoverBuffer, 0, end - 1)); #endregion // If we have something in the queue, return it return _lines.Count == 0 ? null : _lines.Dequeue(); } #endregion #region IEnumerator<string> Interface public IEnumerator<string> GetEnumerator() { string line; // So long as the next line isn't null... while ((line = ReadLine()) != null) // Read and return it. yield return line; } IEnumerator IEnumerable.GetEnumerator() { throw new NotImplementedException(); } #endregion }

为什么不使用返回string[]的file.readalllines？

然后你可以得到最后10行（或数组的成员），这将是一个微不足道的任务。

这种方法没有考虑到任何编码问题，我不确定这种方法的确切效率（完成方法所花费的时间等）。

获取超过10行的超大文本文件> 10GB

使用在PHP中分块上传1GB文件

Java：读取HUGE文件的最后n行

非常大的PHP上传

在Linux C ++应用程序中寻找和读取大文件

HTML5 – 如何stream大型.mp4文件？

Git与大文件

Java中文件的行数

对大型XML文件使用Python Iterparse

Bash – 如何find目录及其子目录中最大的文件？

在PHP中处理大型的JSON文件