Python 如何计算从一个数据帧到另一个数据帧的单词数?

Python 如何计算从一个数据帧到另一个数据帧的单词数?,python,pandas,Python,Pandas,我这里有两个数据框:review和negative\u word(有一列包含一些单词) 我选择了review的一列review['review Text'],然后我想计算每行review['review Text']中否定词的所有单词的次数。 事实上,我用了一个词(比如“美妙”)来测试它,它是有效的。 但当我使用for循环选择数据帧中的所有单词时,它显示所有0 这是我的密码: 因此,如果你不关心记忆(即你有一个可管理的单词数),你可以使用以下方法。如果不是,你可能需要使用循环。如果是这样,很高兴

我这里有两个数据框:
review
negative\u word
(有一列包含一些单词) 我选择了
review
的一列
review['review Text']
,然后我想计算每行
review['review Text']
否定词的所有单词的次数。
事实上,我用了一个词(比如“美妙”)来测试它,它是有效的。
但当我使用for循环选择数据帧中的所有单词时,它显示所有0

这是我的密码:
因此,如果你不关心记忆(即你有一个可管理的单词数),你可以使用以下方法。如果不是,你可能需要使用循环。如果是这样,很高兴更新我的答案

import pandas as pd
import numpy as np

# Data frame
df = pd.DataFrame({'col1':[['a', 'b', 'c', 'c', 'd'], ['c', 'c', 'b', 'x', 'x'], ['x', 'x', 'y', 'y', 'y']]})

# Negative series
neg = pd.Series(['x', 'y', 'z'])

# Create a number of columns equal to the vocabulary size with their counts
df = pd.concat([df, df['col1'].apply(lambda x: pd.Series(x).value_counts())], axis=1)
# From that dataframe get the columns that intersect with values in negative and take the sum
df['neg_count'] = df[df.columns.intersection(neg)].sum(axis=1)
df.head()

对于DataFrame
review
,您可以创建一个函数,仅捕获字符串中的否定词并返回计数。这应该比循环或创建大量DataFrame更快,而且更具可读性

导入字符串
作为pd进口熊猫
#示例数据帧
review=pd.DataFrame({'Item':['BookA'、'Movie B'、'Restaurant C'],
“复习课文”:[“太棒了,我记不下来。”,
“太无聊了。”,
“食物很好吃,但服务很差。”]})
回顾
现在将此函数应用于“审阅文本”列:

review['Count of Negative Words'] = review['Review Text'].map(bad_count)
review
项目回顾否定词的文本计数 这本书太棒了,我无法放下 这部电影非常无聊 餐馆C食物很好吃,但服务很差
你能提供示例数据帧和你想要的输出吗?我们需要你的数据帧或示例数据。你的问题不清楚。
import pandas as pd
import numpy as np

# Data frame
df = pd.DataFrame({'col1':[['a', 'b', 'c', 'c', 'd'], ['c', 'c', 'b', 'x', 'x'], ['x', 'x', 'y', 'y', 'y']]})

# Negative series
neg = pd.Series(['x', 'y', 'z'])

# Create a number of columns equal to the vocabulary size with their counts
df = pd.concat([df, df['col1'].apply(lambda x: pd.Series(x).value_counts())], axis=1)
# From that dataframe get the columns that intersect with values in negative and take the sum
df['neg_count'] = df[df.columns.intersection(neg)].sum(axis=1)
df.head()
Item Review Text 0 Book A It was great, I couldn't put it down. 1 Movie B It was horribly boring. 2 Restaurant C The food was delicious but the service was bad.
# example list of negative words
negative_word = ['bad', 'horrible', 'worst', 'hate', 'boring', '...more words...']

def bad_count(review):
    """Return the number of words from negative list in review text"""
    # strip punctuation
    review = review.strip(string.punctuation)
    # convert to lowercase & separate words
    review = review.lower().split(' ')
    # get list of review words contained in negative word list
    bad = [word for word in review if word in negative_word]
    # return length of list
    return len(bad)
review['Count of Negative Words'] = review['Review Text'].map(bad_count)
review
Item Review Text Count of Negative Words 0 Book A It was great, I couldn't put it down. 0 1 Movie B It was horribly boring. 2 2 Restaurant C The food was delicious but the service was bad. 1