Pandas 创建一个函数来计算实例中的pos数_Pandas_Nltk

Pandas 创建一个函数来计算实例中的pos数

pandas

Pandas 创建一个函数来计算实例中的pos数,pandas,nltk,Pandas,Nltk,我用NLTK在一个旧的Yelp竞赛中的熊猫数据框中定位句子。这将返回元组列表（word、POS）。我想计算每个例子的词性数量。比如说，我该如何创建一个函数来计算每次复习中动词的数量？我知道如何将函数应用于功能，这没有问题。我就是不知道如何在pd功能中计算元组和列表中的内容 The head is here, as a tsv: https://pastebin.com/FnnBq9rf 有很多方法可以做到这一点，一个非常简单的方法是将元组列表（或系列）映射到单词是否是动词的指示符，并计算1的数

我用NLTK在一个旧的Yelp竞赛中的熊猫数据框中定位句子。这将返回元组列表（word、POS）。我想计算每个例子的词性数量。比如说，我该如何创建一个函数来计算每次复习中动词的数量？我知道如何将函数应用于功能，这没有问题。我就是不知道如何在pd功能中计算元组和列表中的内容

The head is here, as a tsv: https://pastebin.com/FnnBq9rf

有很多方法可以做到这一点，一个非常简单的方法是将元组列表（或系列）映射到单词是否是动词的指示符，并计算1的数量

假设您有类似的内容（如果没有，请纠正我，因为您没有提供示例）：

您可以执行以下操作来映射序列并对计数求和：

a.map(lambda x: 1 if x[1]== "verb" else 0).sum()

这将返回

我从你分享的链接中抓取了一句话：

text = nltk.word_tokenize("My wife took me here on my birthday for breakfast and it was excellent.")
tag = nltk.pos_tag(text)
a = pd.Series(tag)
a.map(lambda x: 1 if x[1]== "VBD" else 0).sum()
# this returns 2

谢谢张玉林的帮助。两天后，我学到了一些非常重要的东西（作为一名程序员新手！）。这是解决办法

def NounCounter(x):
   nouns = []
   for (word, pos) in x:
        if pos.startswith("NN"):
            nouns.append(word)
    return nouns

df["nouns"] = df["pos_tag"].apply(NounCounter)
df["noun_count"] = df["nouns"].str.len()

例如，对于dataframe df，可以使用以下代码将列“reviews”的名词计数保存到新列“Non_count”

def NounCount(x):
    nounCount = sum(1 for word, pos in pos_tag(word_tokenize(x)) if pos.startswith('NN'))
    return nounCount

df["noun_count"] = df["reviews"].apply(NounCount)

df.to_csv('./dataset.csv')

我很接近这一点，但它给了我和现在一样的错误：“列表索引超出范围”。下面是我正在使用的代码

df[“noun\u count”]=df[“pos\u tag”].map（lambda x:1如果x[1]=“NN”或“NNP”或“NNS”否则为0）。sum（）

@itsbryhere如果您可以发布正在使用的数据，这将帮助我识别问题。谢谢你帮助伊伦，它在[link]（）@itsbryce的上面。在这里我没有那个错误，你能更新你的问题并给我看你的

df[“pos_tag”]

，就其中的几个吗？每行应该是一个包含两个项的元组。这是df头的tsv[“pos_tag”]

def NounCount(x):
    nounCount = sum(1 for word, pos in pos_tag(word_tokenize(x)) if pos.startswith('NN'))
    return nounCount

df["noun_count"] = df["reviews"].apply(NounCount)

df.to_csv('./dataset.csv')