如何使用nltk或python删除停用词

所以我有一个数据集，我想删除使用停用词

stopwords.words('english')

我正在努力如何在我的代码中使用这个只是简单地拿出这些单词。我已经从这个数据集的单词列表中，我正在努力与该列表比较，并删除停用词。任何帮助表示赞赏。

 from nltk.corpus import stopwords # ... filtered_words = [word for word in word_list if word not in stopwords.words('english')]

你也可以做一个比较，例如：

 list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk.corpus.stopwords.words('english')))

我想你有一个你想删除停用词的单词列表（word_list）。你可以做这样的事情：

 filtered_word_list = word_list[:] #make a copy of the word_list for word in word_list: # iterate over word_list if word in stopwords.words('english'): filtered_word_list.remove(word) # remove word from filtered_word_list if it is a stopword

使用filter ：

 from nltk.corpus import stopwords # ... filtered_words = list(filter(lambda word: word not in stopwords.words('english'), word_list))

  import sys print ("enter the string from which you want to remove list of stop words") userstring = input().split(" ") list =["a","an","the","in"] another_list = [] for x in userstring: if x not in list: # comparing from the list and removing it another_list.append(x) # it is also possible to use .remove for x in another_list: print(x,end=' ') # 2) if you want to use .remove more preferred code import sys print ("enter the string from which you want to remove list of stop words") userstring = input().split(" ") list =["a","an","the","in"] another_list = [] for x in userstring: if x in list: userstring.remove(x) for x in userstring: print(x,end = ' ') #the code will be like this

你可以使用这个function，你应该注意到你需要降低所有的单词

 from nltk.corpus import stopwords def remove_stopwords(word_list): processed_word_list = [] for word in word_list: word = word.lower() # in case they arenet all lower cased if word not in stopwords.words("english"): processed_word_list.append(word) return processed_word_list

要排除包括nltk停用词在内的所有types的停用词，可以这样做：

 from many_stop_words import get_stop_words from nltk.corpus import stopwords stop_words = list(get_stop_words('en')) #About 900 stopwords nltk_words = list(stopwords.words('english')) #About 150 stopwords stop_words.extend(nltk_words) output = [w for w in word_list if not w in stop_words]

如何使用nltk或python删除停用词

使用NLTK清除停用词