在Python中仅使用Numpy从列表中删除停止词_Python_List

在Python中仅使用Numpy从列表中删除停止词

python list

在Python中仅使用Numpy从列表中删除停止词,python,list,Python,List,我正在使用numpy删除python中的停止词。stopwords文件将作为列表导入。下面是我想说的：方法1，我尝试循环遍历停止词列表，并从tw_行中删除所有人 # loop through the stop words list, and remove each one from the splitted line list for line in stopwords: if line in words: words.remove(line)

我正在使用numpy删除python中的停止词。stopwords文件将作为列表导入。下面是我想说的：

方法1，我尝试循环遍历停止词列表，并从tw_行中删除所有人

# loop through the stop words list, and remove each one from the splitted    line list
 for line in stopwords:
     if line in words:
         words.remove(line) 
         continue
     print (tw_line)

结果：未删除任何停止字

0 my whole body feels itchy and like its on fire

方法2，我试着在停止词列表中循环单词

# loop through the line, and check with stop words, if not in stop words, add to clean_line
clean_line=[]
tw_line.split(" ")
  for line in tw_line:
      if line in stopwords:
          clean_line.append(line)            
  print(clean_line)

结果：所有单词都被拆分为字符

['m', 'y', 'w', 'h', 'o', 'l', 'e', 'b', 'o', 'd', 'y', 'f', 'e', 'e', 'l', 's', 'i', 'c', 'h', 'y', 'a', 'n', 'd', 'l', 'i', 'k', 'e', 'i', 's', 'o', 'n', 'f', 'i', 'r', 'e']

有什么帮助吗？

尝试应用以下方法：

>>> str1 = "my whole body feels itchy and like its on fire"
>>> str1.split()
 ['my', 'whole', 'body', 'feels', 'itchy', 'and', 'like', 'its', 'on', 'fire']
>>>

然后删除stopwords中的单词。顺便说一句，我在这里没有看到任何numpy。

尝试应用此：

>>> str1 = "my whole body feels itchy and like its on fire"
>>> str1.split()
 ['my', 'whole', 'body', 'feels', 'itchy', 'and', 'like', 'its', 'on', 'fire']
>>>

然后删除stopwords中的单词。顺便说一句，我在这里没有看到任何numpy。

您应该打印word而不是tw\u行，因为word是您删除停止字的地方

for line in stopwords:
 if line in words:
     words.remove(line) 
     continue
 print (words)

您应该打印word而不是tw_行，因为word是您删除停止字的位置

for line in stopwords:
 if line in words:
     words.remove(line) 
     continue
 print (words)

方法2显然是您想要做的。但是，您可以改进以下方面：

正如Paul Panzer所说，
```
split
```
无法正常工作，因此您需要这样做
```
tw_list=tw_行。拆分（“”）
```

您可以使用列表理解而不是循环（如果您打算

之后加入，甚至可以使用生成器）。
clean_line=[如果单词不在stopwords中，则在tw_列表中逐字逐句]


我从您的代码注释中看到，stopwords
是一个列表。出于效率原因，您可能希望将其设置为一个集合（）
方法2显然是您想要做的。但是，您可以改进以下方面：

正如Paul Panzer所说，split无法正常工作，因此您需要这样做
tw_list=tw_行。拆分（“”）

您可以使用列表理解而不是循环（如果您打算之后加入，甚至可以使用生成器）。
clean_line=[如果单词不在stopwords中，则在tw_列表中逐字逐句]

我从您的代码注释中看到，stopwords
是一个列表。出于效率原因，您可能希望将其设置为一个集合（）
问题是什么？那么numpy
与此有何关联？如果您能举例说明数据的外观，这将非常有帮助。请参见如何创建示例方法2中使用的.split
成员函数没有“就地”工作（怎么可能，它正在生成一个新类型（从字符串列表））您必须将其返回值分配给tw_line
或一个新变量。numpy是唯一允许使用的库…因此我不能在其他库中使用内置方法。问题是什么？那么numpy
与此有何关联？如果您能举例说明数据的外观，这将非常有帮助。请参见如何创建示例方法2中使用的.split
成员函数没有“就地”工作（怎么可能，它正在生成一个新类型（从字符串列表））您必须将其返回值分配给tw_line
或一个新变量。numpy是唯一允许使用的库…因此我不能在其他库中使用内置方法。我尝试过，它本身就可以工作。当我把它放入函数中时，这一行给出了一个奇怪的结果：print（stopwords[：10]）['a'，'able'，'about'，'about'，'abst'，'Accordination'，'Accordinate'，'Cross'，'act']
def remove\u stopwords（tw）：open（'stop\u words.txt'）作为f:stopwords=f.readlines（）作为索引，行在枚举（stopwords）中：line=line.strip（'\n'）stopwords[index]=line remove_punc（stopwords）for index，line in enumerate（tw）：clean_line=[]clean_line=[word for word in line if word not in stopwords]line=string.join（clean_line）tw[index]=line#将行存储回tw return tw[:5]
['0 t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t\n'、'0 t t t t t t t t t t t t t t t t\n'、'0 t t t t t t t t t t t\n']
看起来它是以字符串的形式删除，而不再是列表。但我清楚地将它们定义为列表。好的，我想我知道这里发生了什么。你正在尝试逐行思考，而最简单的可能是以字符串的形式思考。你应该有一个函数remove_stopwords
，它将字符串和一系列参数作为参数stopwords并返回一个不带stopwords的字符串（因此，只需按照我的建议返回“”“.join（clean_line）”）。然后，如果文件太大，您可以逐行应用该函数，但坦率地说，在大多数情况下，它可以直接完成。我尝试过，它本身就可以工作。当我将其放入函数中时，该行给出了一个奇怪的结果：print（stopwords[：10]）['a'、'able'、'about'、'about'、'about'、'abs'、'correlated'、'cross'、'act']
def remove\u stopwords（tw）：打开（'stop\u words.txt'）作为f:stopwords=f.readlines（）作为索引，枚举中的行（stopwords）：line=line.strip（'\n'）stopwords[index]=line remove\u punc（stopwords）对于索引，枚举中的行（tw）：clean_line=[]clean_line=[如果单词不在stopwords中，则逐字逐行]line=string。join（clean_line）tw[index]=line#将行存储回tw返回tw[：5]
['0 t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t\n'、'0 t t t t t t t t t t t t t t t t\n'、'0 t t t t t t t t t t t\n']
看起来它是以字符串的形式删除的，不再是列表。但我清楚地将它们定义为列表。好的，我想我知道这里发生了什么。你正在试图逐行思考，在哪里