读取一个文本文件，并将其拆分成python中的单个单词

所以我有这个文本文件由数字和单词组成，例如像这样 – 09807754 18 n 03 aristocrat 0 blue_blood 0 patrician ，我想分裂它，以便每个单词或数字将出现一个新的行。

一个空白分隔符将是理想的，因为我想用破折号的话保持连接。

这是我迄今为止：

 f = open('words.txt', 'r') for word in f: print(word)

不太确定如何从这里走，我想这是成果：

 09807754 18 n 3 aristocrat ...

如果你的数据没有引号：

 with open('words.txt','r') as f: for line in f: for word in line.split(): print(word)

如果您想在文件的每一行中使用单词的嵌套列表：

 with open("words.txt") as f: [line.split() for line in f]

或者，如果您想将其压缩成文件中的单个单词列表，则可以这样做：

 with open("words.txt") as f: [word for line in f for word in line.split()]

如果你想要一个正则expression式的解决scheme：

 import re with open("words.txt") as f: for line in f: for word in re.findall(r'\w+', line): # word by word

或者，如果你想这是一个逐行生成器与正则expression式：

  with open("words.txt") as f: (word for line in f for word in re.findall(r'\w+', line))

 f = open('words.txt') for word in f.read().split(): print(word)

作为补充，如果您正在读取一个vvvvery大文件，并且不想一次将所有内容读入内存，则可以考虑使用缓冲区 ，然后通过yield返回每个单词：

 def read_words(inputfile): with open(inputfile, 'r') as f: while True: buf = f.read(10240) if not buf: break # make sure we end on a space (word boundary) while not str.isspace(buf[-1]): ch = f.read(1) if not ch: break buf += ch words = buf.split() for word in words: yield word yield '' #handle the scene that the file is empty if __name__ == "__main__": for word in read_words('./very_large_file.txt'): process(word)

这是我完全function的方法，避免了不得不阅读和拆分线。它使用了itertools模块：

注意python 3，用`map`replace`itertools.imap`

 import itertools def readwords(mfile): byte_stream = itertools.groupby( itertools.takewhile(lambda c: bool(c), itertools.imap(mfile.read, itertools.repeat(1))), str.isspace) return ("".join(group) for pred, group in byte_stream if not pred)

示例用法：

 >>> import sys >>> for w in readwords(sys.stdin): ... print (w) ... I really love this new method of reading words in python I really love this new method of reading words in python It's soo very Functional! It's soo very Functional! >>>

我想你的情况，这将是使用该function的方式：

 with open('words.txt', 'r') as f: for word in readwords(f): print(word)

读取一个文本文件，并将其拆分成python中的单个单词

注意python 3，用`map`replace`itertools.imap`

如何在javascript中计算一个string的行数

在Pythonstring中分割最后的分隔符？

如何parsingCSVstring，其中包含数据中的逗号？

Javastring拆分删除空值

如何在Ruby中拆分string并获取除第一个以外的所有项目？

正则expression式拆分string，但保留分隔符

用点作为分隔符分割string

将string拆分为单词和标点符号

Java：获取分割后的最后一个元素

将dataframe分成多个dataframe

读取一个文本文件，并将其拆分成python中的单个单词

注意python 3，用mapreplaceitertools.imap

如何在javascript中计算一个string的行数

在Pythonstring中分割最后的分隔符？

如何parsingCSVstring，其中包含数据中的逗号？

Javastring拆分删除空值

如何在Ruby中拆分string并获取除第一个以外的所有项目？

正则expression式拆分string，但保留分隔符

用点作为分隔符分割string

将string拆分为单词和标点符号

Java：获取分割后的最后一个元素

将dataframe分成多个dataframe

注意python 3，用`map`replace`itertools.imap`