在python脚本中读取tar文件的内容而不用解开它

我有一个tar文件，里面有一些文件。我需要编写一个python脚本，它将读取文件的内容，并提供总字符数，包括字母，空格，换行符，所有内容的总数，而不需要解压tar文件。

你可以使用getmembers（）

>>> import tarfile >>> tar = tarfile.open("test.tar") >>> tar.getmembers()

之后，您可以使用extractfile（）将成员提取为文件对象。只是一个例子

 import tarfile,os import sys os.chdir("/tmp/foo") tar = tarfile.open("test.tar") for member in tar.getmembers(): f=tar.extractfile(member) content=f.read() print "%s has %d newlines" %(member, content.count("\n")) print "%s has %d spaces" % (member,content.count(" ")) print "%s has %d characters" % (member, len(content)) sys.exit() tar.close()

在上面的例子中，对于文件对象“f”，可以使用read（），readlines（）等

你需要使用tarfile模块。具体而言，您使用TarFile类的实例来访问该文件，然后使用TarFile.getnames（）访问名称

  | getnames(self) | Return the members of the archive as a list of their names. It has | the same order as the list returned by getmembers().

如果你想读取内容，那么你使用这个方法

  | extractfile(self, member) | Extract a member from the archive as a file object. `member' may be | a filename or a TarInfo object. If `member' is a regular file, a | file-like object is returned. If `member' is a link, a file-like | object is constructed from the link's target. If `member' is none of | the above, None is returned. | The file-like object is read-only and provides the following | methods: read(), readline(), readlines(), seek() and tell()

@ stefano-borini提到的方法的实现像这样通过文件名访问一个tar归档成员

 #python3 myFile = myArchive.extractfile( dict(zip( myArchive.getnames(), myArchive.getmembers() ))['path/to/file'] ).read()`

积分：

dict(zip(从https://stackoverflow.com/a/209854/1695680
tarfile.getnames从https://stackoverflow.com/a/2018523/1695680
另外，对于我的使用，从缓冲区读取一个tar档案如何在Python 3的字节缓冲区内存中build立一个TarFile对象？

在python脚本中读取tar文件的内容而不用解开它

什么是PHP的var_dump（）的Python等价物？

Python中init和call有什么区别？

如何读取/处理命令行参数？

在Python中打包遗留的FORTRAN。可以使用`setuptools`和`numpy.distutils`吗？

在Eclipse中使用Pydev的交互式控制台？

用NLTK创build一个新的语料库

从python的列表中获取唯一的值

为什么我们在导入print_function之后调用print（在Python 2.6中）

ImportError：在ubuntu 14.04中没有名为_io的模块

如何识别脚本是否在tty上运行？

在python脚本中读取tar文件的内容而不用解开它

什么是PHP的var_dump（）的Python等价物？

Python中__init__和__call__有什么区别？

如何读取/处理命令行参数？

在Python中打包遗留的FORTRAN。 可以使用`setuptools`和`numpy.distutils`吗？

在Eclipse中使用Pydev的交互式控制台？

用NLTK创build一个新的语料库

从python的列表中获取唯一的值

为什么我们在导入print_function之后调用print（在Python 2.6中）

ImportError：在ubuntu 14.04中没有名为_io的模块

如何识别脚本是否在tty上运行？

Python中init和call有什么区别？

在Python中打包遗留的FORTRAN。可以使用`setuptools`和`numpy.distutils`吗？