将string转换为字列表？

我试图将string转换为使用Python的单词列表。我想采取如下的东西：

string = 'This is a string, with words!'

然后转换成这样的东西：

 list = ['This', 'is', 'a', 'string', 'with', 'words']

注意省略标点和空格。什么是最快的方式去做这个？

尝试这个：

 import re mystr = 'This is a string, with words!' wordList = re.sub("[^\w]", " ", mystr).split()

怎么运行的：

从文档：

 re.sub(pattern, repl, string, count=0, flags=0)

通过replacereplacerepl来replacestring中最左边不重叠出现的string所得到的string。如果未find该模式，则string将保持不变。 repl可以是一个string或一个函数。

所以在我们的情况下：

模式是任何非字母数字字符。

[\ w]表示任何字母数字字符，并且等于字符集[a-zA-Z0-9_]

a到z，A到Z）到9并下划线。

所以我们匹配任何非字母数字字符并将其replace为空格。

然后我们分割（）它分割string的空间，并将其转换为列表

所以'你好，世界'

成为“你好世界”

与re.sub

然后['你好'，'世界']

split（）之后

让我知道是否有任何怀疑出现。

我认为这是任何人在这个post上磕磕绊绊的最简单的方法，

 >>> string = 'This is a string, with words!' >>> string.split() ['This', 'is', 'a', 'string,', 'with', 'words!']

要做到这一点是相当复杂的。为了您的研究，它被称为词标记化。你应该看看NLTK，如果你想看看别人做了什么，而不是从头开始：

 >>> import nltk >>> paragraph = u"Hi, this is my first sentence. And this is my second." >>> sentences = nltk.sent_tokenize(paragraph) >>> for sentence in sentences: ... nltk.word_tokenize(sentence) [u'Hi', u',', u'this', u'is', u'my', u'first', u'sentence', u'.'] [u'And', u'this', u'is', u'my', u'second', u'.']

最简单的方法：

 >>> import re >>> string = 'This is a string, with words!' >>> re.findall(r'\w+', string) ['This', 'is', 'a', 'string', 'with', 'words']

使用string.punctuation完整性：

 import re import string x = re.sub('['+string.punctuation+']', '', s).split()

这也处理换行。

正则expression式的话会给你最大的控制。你会仔细考虑如何处理与破折号或撇号，如“我”。

那么，你可以使用

 import re list = re.sub(r'[.!,;?]', ' ', string).split()

请注意， string和list都是内置types的名称，因此您可能不希望将这些名称用作variables名称。

 list=mystr.split(" ",mystr.count(" "))

灵感来自@ mtrw的回答，但改进仅在字边界去掉标点符号：

 import re import string def extract_words(s): return [re.sub('^[{0}]+|[{0}]+$'.format(string.punctuation), '', w) for w in s.split()] >>> str = 'This is a string, with words!' >>> extract_words(str) ['This', 'is', 'a', 'string', 'with', 'words'] >>> str = '''I'm a custom-built sentence with "tricky" words like https://stackoverflow.com/.''' >>> extract_words(str) ["I'm", 'a', 'custom-built', 'sentence', 'with', 'tricky', 'words', 'like', 'https://stackoverflow.com']

这是来自我的编码挑战，不能使用正则expression式的尝试，

 outputList = "".join((c if c.isalnum() or c=="'" else ' ') for c in inputStr ).split(' ')

撇号的作用似乎很有趣。

这样就消除了字母表外的每一个特殊字符：

 def wordsToList(strn): L = strn.split() cleanL = [] abc = 'abcdefghijklmnopqrstuvwxyz' ABC = abc.upper() letters = abc + ABC for e in L: word = '' for c in e: if c in letters: word += c if word != '': cleanL.append(word) return cleanL s = 'She loves you, yea yea yea! ' L = wordsToList(s) print(L) # ['She', 'loves', 'you', 'yea', 'yea', 'yea']

我不确定这是快速还是最佳，甚至是正确的编程方式。

你可以尝试这样做：

 tryTrans = string.maketrans(",!", " ") str = "This is a string, with words!" str = str.translate(tryTrans) listOfWords = str.split()

将string转换为字列表？

List <T>是否保证项目将按照他们添加的顺序返回？

如何将列表转换为string

如何在Python中获取列表的最后一项？

LINQ：如何跳过一个然后采取序列的其余部分

最快的方法来比较两个List <>

如何访问列表元素

Python中的recursion基础

Python列表乘法：] * 3使3个列表相互镜像时修改

查找不在列表中的元素

List <List <int >>的组合