Python 如何计算从一个数据帧到另一个数据帧的单词数？_Python_Pandas

Python 如何计算从一个数据帧到另一个数据帧的单词数？

python pandas

Python 如何计算从一个数据帧到另一个数据帧的单词数？,python,pandas,Python,Pandas,我这里有两个数据框：review和negative\u word（有一列包含一些单词）我选择了review的一列review['review Text']，然后我想计算每行review['review Text']中否定词的所有单词的次数。事实上，我用了一个词（比如“美妙”）来测试它，它是有效的。但当我使用for循环选择数据帧中的所有单词时，它显示所有0 这是我的密码：因此，如果你不关心记忆（即你有一个可管理的单词数），你可以使用以下方法。如果不是，你可能需要使用循环。如果是这样，很高兴

我这里有两个数据框：

review

和

negative\u word

（有一列包含一些单词）我选择了

review

的一列

review['review Text']

，然后我想计算每行

review['review Text']

中

否定词的所有单词的次数。
事实上，我用了一个词（比如“美妙”）来测试它，它是有效的。
但当我使用for循环选择数据帧中的所有单词时，它显示所有0
这是我的密码：
因此，如果你不关心记忆（即你有一个可管理的单词数），你可以使用以下方法。如果不是，你可能需要使用循环。如果是这样，很高兴更新我的答案
import pandas as pd
import numpy as np

# Data frame
df = pd.DataFrame({'col1':[['a', 'b', 'c', 'c', 'd'], ['c', 'c', 'b', 'x', 'x'], ['x', 'x', 'y', 'y', 'y']]})

# Negative series
neg = pd.Series(['x', 'y', 'z'])

# Create a number of columns equal to the vocabulary size with their counts
df = pd.concat([df, df['col1'].apply(lambda x: pd.Series(x).value_counts())], axis=1)
# From that dataframe get the columns that intersect with values in negative and take the sum
df['neg_count'] = df[df.columns.intersection(neg)].sum(axis=1)
df.head()

对于DataFramereview
，您可以创建一个函数，仅捕获字符串中的否定词并返回计数。这应该比循环或创建大量DataFrame更快，而且更具可读性
导入字符串
作为pd进口熊猫
#示例数据帧
review=pd.DataFrame（{'Item'：['BookA'、'Movie B'、'Restaurant C']，
“复习课文”：[“太棒了，我记不下来。”，
“太无聊了。”，
“食物很好吃，但服务很差。”]}）
回顾

现在将此函数应用于“审阅文本”列：
review['Count of Negative Words'] = review['Review Text'].map(bad_count)
review

项目回顾否定词的文本计数
这本书太棒了，我无法放下
这部电影非常无聊
餐馆C食物很好吃，但服务很差
你能提供示例数据帧和你想要的输出吗？我们需要你的数据帧或示例数据。你的问题不清楚。
import pandas as pd
import numpy as np

# Data frame
df = pd.DataFrame({'col1':[['a', 'b', 'c', 'c', 'd'], ['c', 'c', 'b', 'x', 'x'], ['x', 'x', 'y', 'y', 'y']]})

# Negative series
neg = pd.Series(['x', 'y', 'z'])

# Create a number of columns equal to the vocabulary size with their counts
df = pd.concat([df, df['col1'].apply(lambda x: pd.Series(x).value_counts())], axis=1)
# From that dataframe get the columns that intersect with values in negative and take the sum
df['neg_count'] = df[df.columns.intersection(neg)].sum(axis=1)
df.head()

Item                                      Review Text
0        Book A            It was great, I couldn't put it down.
1       Movie B                          It was horribly boring.
2  Restaurant C  The food was delicious but the service was bad.
# example list of negative words
negative_word = ['bad', 'horrible', 'worst', 'hate', 'boring', '...more words...']

def bad_count(review):
    """Return the number of words from negative list in review text"""
    # strip punctuation
    review = review.strip(string.punctuation)
    # convert to lowercase & separate words
    review = review.lower().split(' ')
    # get list of review words contained in negative word list
    bad = [word for word in review if word in negative_word]
    # return length of list
    return len(bad)

review['Count of Negative Words'] = review['Review Text'].map(bad_count)
review

Item                                      Review Text  Count of Negative Words
0        Book A            It was great, I couldn't put it down.                        0
1       Movie B                          It was horribly boring.                        2
2  Restaurant C  The food was delicious but the service was bad.                        1