为什么在for循环中将单词从复数转换为单数要花这么长时间（Python 3）？_Python_Pandas_For Loop_Nlp_Textblob

为什么在for循环中将单词从复数转换为单数要花这么长时间（Python 3）？

python pandas for-loop nlp

为什么在for循环中将单词从复数转换为单数要花这么长时间（Python 3）？,python,pandas,for-loop,nlp,textblob,Python,Pandas,For Loop,Nlp,Textblob,这是我的代码，用于从CSV文件中读取文本，并将其中一列中的所有单词从复数形式转换为单数形式： import pandas as pd from textblob import TextBlob as tb data = pd.read_csv(r'path\to\data.csv') for i in range(len(data)): blob = tb(data['word'][i]) singular = blob.words.singularize() # This

这是我的代码，用于从CSV文件中读取文本，并将其中一列中的所有单词从复数形式转换为单数形式：

import pandas as pd
from textblob import TextBlob as tb
data = pd.read_csv(r'path\to\data.csv')

for i in range(len(data)):
    blob = tb(data['word'][i])
    singular = blob.words.singularize()  # This makes singular a list
    data['word'][i] = ''.join(singular)  # Converting the list back to a string

但是这段代码已经运行了几分钟了（如果我不停止的话，可能会持续运行几个小时！）！为什么呢？当我单独检查几个单词时，转换立即发生——根本不需要任何时间。文件中只有1060行（要转换的字）

编辑：它在大约10-12分钟内完成运行

以下是一些示例数据：

输入：

word
development
investment
funds
slow
company
commit
pay
claim
finances
customers
claimed
insurance
comment
rapid
bureaucratic
affairs
reports
policyholders
detailed

输出：

word
development
investment
fund
slow
company
commit
pay
claim
finance
customer
claimed
insurance
comment
rapid
bureaucratic
affair
report
policyholder
detailed

这个怎么样

In [1]: import pandas as pd

In [2]: from textblob import Word

In [3]: s = pd.read_csv('text', squeeze=True, memory_map=True)

In [4]: type(s)
Out[4]: pandas.core.series.Series

In [5]: s = s.apply(lambda w: Word(w).singularize())

In [6]: s
Out[6]:
0      development
1       investment
2             fund
3             slow
4          company
5           commit
6              pay
7            claim
8          finance
9         customer
10         claimed
11       insurance
12         comment
13           rapid
14    bureaucratic
15          affair
16          report
17    policyholder
18        detailed
Name: word, dtype: object

我在这里使用

screek

让

read\u csv

返回一个序列而不是数据帧，因为word file只有一列。此外，如果word文件较大，则可以使用

内存映射

您能用数据测试性能吗？

您正在迭代数据帧。表现会很糟糕。@RafaelC哦！我不知道！为什么会这样？如果不是数据帧，我应该用什么来存储文件？我发现在Python中使用多维列表是a**中的一个难题——它不像在C中那样直观。因为您不断地在Python/C阈值上移动数据，这是非常昂贵的。而且，

.words

是一个相当复杂的操作<代码>.singularize可能是代码中最快的东西。您能提供一些示例输入/输出吗？另外，您正在执行

数据['word'][I]

，可能会收到一条警告，说明您正在更改副本而不是df？@RafaelC是的，它确实抛出了该警告！我对问题进行了编辑，以包含部分输入和输出。