Python 如何检查数据集中的平均字数？_Python_Scikit Learn_Jupyter Notebook

Python 如何检查数据集中的平均字数？

python scikit-learn jupyter-notebook

Python 如何检查数据集中的平均字数？,python,scikit-learn,jupyter-notebook,Python,Scikit Learn,Jupyter Notebook,我有一个经过训练和验证的数据集，现在我想根据标签检查每个tweet数据集中的平均单词总量。我有一个推特数据集，包含5个主要的情绪分类标签——快乐、悲伤、愤怒、恐惧和爱。我听说count可用于查看数据集中使用的字数，但我遇到了一个错误： temp2 = 0 for row in df_h['clean']: temp2 = temp2 + count(row.split()) avg_h = temp/len(df_h) temp2 = 0 for row in df_s['clean

我有一个经过训练和验证的数据集，现在我想根据标签检查每个tweet数据集中的平均单词总量。我有一个推特数据集，包含5个主要的情绪分类标签——快乐、悲伤、愤怒、恐惧和爱。我听说count可用于查看数据集中使用的字数，但我遇到了一个错误：

temp2 = 0

for row in df_h['clean']:
    temp2 = temp2 + count(row.split())
avg_h = temp/len(df_h)

temp2 = 0
for row in df_s['clean']:
    temp2 = temp2 + count(row.split())
avg_s = temp/len(df_s)

temp2 = 0
for row in df_a['clean']:
    temp2 = temp2 + count(row.split())
avg_a = temp/len(df_a)

temp2 = 0
for row in df_f['clean']:
    temp2 = temp2 + count(row.split())
avg_f = temp/len(df_f)

temp2 = 0
for row in df_l['clean']:
    temp2 = temp2 + count(row.split())
avg_l = temp/len(df_l)

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

kelas = ['happy', 'sadness', 'anger', 'fear', 'sadness']
y_pos = np.arange(len(kelas))
average = [avg_h, avg_s, avg_f, avg_a, avg_l] 

plt.bar(y_pos, average, align='center', alpha=0.5, width=0.9)

plt.xticks(y_pos, kelas)
plt.ylabel('average tweets')
plt.title('characteristic of tweets')

# for a,b in zip(y_pos, kelas):
#    plt.text(a, b, str(b), horizontalalignment='center')
    
plt.savefig('average_tweet.png')
plt.show()

然而，我得到一个错误，说“计数”没有正确定义。如何定义计数以便计算每个标签的平均字数？谢谢大家!

您需要使用len（row.split（））而不是count（row.split（））

使用split（）时，您会得到一个字符串列表。为了找到这些拆分字符串的数量，只需找到从row.split（）获得的结果列表的长度

Count还有另一个函数，在这里不完全适用。count（）返回给定字符串中子字符串的出现次数