python计算每行中的单词数，并保存在新列中_Python

python计算每行中的单词数，并保存在新列中

python

python计算每行中的单词数，并保存在新列中,python,Python,刚接触python，开始学习处理数据，但遇到了一些问题我有一个数据集（熊猫），每一行我都有一个句子。我想创建一个新列，计算句子（每行）中的单词数如果句子是：“Hello World Hello dogs”，那么单词计数器将是- {'Hello' - 2, 'World' - 1, 'dogs' -1} 我通常使用graphlab，通过以下方式完成： dataset['new_column'] = graphlab.text_analytics.count_words(..) 我看到了很多

刚接触python，开始学习处理数据，但遇到了一些问题

我有一个数据集（熊猫），每一行我都有一个句子。我想创建一个新列，计算句子（每行）中的单词数

如果句子是：“Hello World Hello dogs”，那么单词计数器将是-

{'Hello' - 2, 'World' - 1, 'dogs' -1}

我通常使用graphlab，通过以下方式完成：

dataset['new_column'] = graphlab.text_analytics.count_words(..)

我看到了很多类似的解决方案，但在添加新列时没有在数据集上看到，而且我从未真正用python编程

我想要一些指导

我建议不要将字典存储在数据帧的单元格中，但是如果没有办法，您可以使用

计数器

dataset = pd.DataFrame([['Hello world dogs'], ['this is another sentence']], columns=['column_of_interest'] )

from collections import Counter
dataset['new_column'] = dataset.column_of_interest.apply(lambda x: Counter(x.split(' ')))
dataset

    column_of_interest  new_column
0   Hello world dogs    {'dogs': 1, 'world': 1, 'Hello': 1}
1   this is another sentence    {'is': 1, 'sentence': 1, 'this': 1, 'another': 1}

编辑：根据下面的注释，如果单元格中不包含字符串，则在拆分

lambda x:Counter（str（x）.split（“”））

之前，您可能需要将其转换为

str

，我建议您不要在数据帧的单元格中存储字典，但是，如果没有办法，您可以使用

计数器

dataset = pd.DataFrame([['Hello world dogs'], ['this is another sentence']], columns=['column_of_interest'] )

from collections import Counter
dataset['new_column'] = dataset.column_of_interest.apply(lambda x: Counter(x.split(' ')))
dataset

    column_of_interest  new_column
0   Hello world dogs    {'dogs': 1, 'world': 1, 'Hello': 1}
1   this is another sentence    {'is': 1, 'sentence': 1, 'this': 1, 'another': 1}

编辑：根据下面的注释，如果单元格中不包含字符串，您可能需要在拆分

lambda x:Counter（str（x）.split（“”））

之前将其转换为

str

如果有人想要，没有熊猫的答案是：

def word_count(text):
    word_count = {}
    for word in text.split():
        if word not in word_count:
            word_count[word] = 1
        else:
            word_count[word] += 1
    return word_count

data['word_count'] = data['sentences'].apply(word_count)

测试：

print word_count("Hello Hello world")

输出：

{'world': 1, 'Hello': 2}

被接受的答案起了作用

如果有人想要，没有熊猫的答案是：

def word_count(text):
    word_count = {}
    for word in text.split():
        if word not in word_count:
            word_count[word] = 1
        else:
            word_count[word] += 1
    return word_count

data['word_count'] = data['sentences'].apply(word_count)

测试：

print word_count("Hello Hello world")

输出：

{'world': 1, 'Hello': 2}

我得到了一个错误-“'float'对象没有属性'split'”，它当然可以在您的代码上工作，但是在我的csv文件上它不工作。如果您可以编辑原始问题，包括数据输入应该是什么以及输出应该是什么的示例，则可能更容易排除故障。我担心没有看到数据，我提出的任何建议都是猜测。这么说来，我的猜测是，有些行只包含需要转换为字符串的数字。我得到一个错误-“'float'对象没有属性'split'”，它当然适用于您的代码，但在我的csv文件中，如果您可以编辑原始问题以包括数据输入的示例，输出应该是什么，可能更容易排除故障。我担心没有看到数据，我提出的任何建议都是猜测。话虽如此，我的猜测是，有些行只包含需要转换为字符串的数字