Python 从成对数据创建数据帧的快速方法_Python_Pandas_Tags

Python 从成对数据创建数据帧的快速方法

python pandas tags

Python 从成对数据创建数据帧的快速方法,python,pandas,tags,Python,Pandas,Tags,我有一个很大的单词/标记对文件，保存方式如下： This/DT gene/NN called/VBN gametocide/NN DT | NN -- This| 1 0 Gene| 0 1 : 现在我想将这些对放入一个数据帧，其计数如下： This/DT gene/NN called/VBN gametocide/NN DT | NN -- This| 1 0 Gene| 0 1 : 我尝试用一个dict来计算对数，然后将其放入数据帧

我有一个很大的单词/标记对文件，保存方式如下：

This/DT gene/NN called/VBN gametocide/NN

      DT | NN --
This|  1   0
Gene|  0   1
 :

现在我想将这些对放入一个数据帧，其计数如下：

This/DT gene/NN called/VBN gametocide/NN

      DT | NN --
This|  1   0
Gene|  0   1
 :

我尝试用一个dict来计算对数，然后将其放入数据帧：

file = open("data.txt", "r")

train = file.read()
words = train.split()

data = defaultdict(int)
for i in words:
    data[i] += 1

matrixB = pd.DataFrame()

for elem, count in data.items():
    word, tag = elem.split('/')
    matrixB.loc[tag, word] = count

但这需要很长时间（文件中有300000个这样的文件）。有没有更快的方法呢？

我觉得这个问题非常相似。。。你为什么发了两次

from collection import Counter

text =  "This/DT gene/NN called/VBN gametocide/NN"

>>> pd.Series(Counter(tuple(pair.split('/')) for pair in text.split())).unstack().fillna(0)

            DT  NN  VBN
This         1   0    0
called       0   0    1
gametocide   0   1    0
gene         0   1    0

你的答案怎么了

屈服

            DT  NN  VBN
This         1   0    0
called       0   0    1
gametocide   0   1    0
gene         0   1    0

没什么，只是在我看到你的答案之前还在测试。这对我帮助很大，非常感谢！太好了-很高兴它有帮助！