Python 使用pandas groupby查找每个组中文本的平均长度_Python_Pandas_Pandas Groupby

Python 使用pandas groupby查找每个组中文本的平均长度

python pandas

Python 使用pandas groupby查找每个组中文本的平均长度,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我用的是莎士比亚文集 act literature_type scene scene_text scene_title speaker title 0 1 Comedy 1 In delivering my son from me, I bury a second ... Rousillon. The COUNT's palace. COUNTESS All's Well That Ends Well 1 1 Comedy 1 And I i

我用的是莎士比亚文集

    act literature_type scene   scene_text  scene_title speaker title
0   1   Comedy  1   In delivering my son from me, I bury a second ...   Rousillon. The COUNT's palace.  COUNTESS    All's Well That Ends Well
1   1   Comedy  1   And I in going, madam, weep o'er my father's d...   Rousillon. The COUNT's palace.  BERTRAM All's Well That Ends Well
2   1   Comedy  1   You shall find of the king a husband, madam; y...   Rousillon. The COUNT's palace.  LAFEU   All's Well That Ends Well
3   1   Comedy  1   What hope is there of his majesty's amendment?  Rousillon. The COUNT's palace.  COUNTESS    All's Well That Ends Well
4   1   Comedy  1   He hath abandoned his physicians, madam; under...   Rousillon. The COUNT's palace.  LAFEU   All's Well That Ends Well

我想找出每个标题的平均

scene\u文本长度
我想用一些大致如下的方法：
all_works_by_speaker_df.groupby('title').apply(lambda x: np.mean(len(x)))

这只是返回每个标题中的场景数
 如果需要len
s个字符：
df = (all_works_by_speaker_df.groupby('title')['scene_text']
                            .apply(lambda x: np.mean(x.str.len()))
                            .reset_index(name='mean_len_text'))
print (df)

                       title  mean_len_text
0  All's Well That Ends Well           48.4

如果需要，请使用len
s的单词。
拆分、len和mean
df.groupby('title').scene_text.apply(lambda x: x.str.split().str.len().mean())


title
All's Well That Ends Well    9.2

从列中获取字符串的长度，然后按一个数组分组，该数组是您的播放标题，然后应用平均值
mean_len = df.scene_text.str.len().groupby(df.title).mean()

len是指单词还是字符？不太清楚这是针对字符的，您的解决方案是针对字长。谢谢你们两位@jezrael你似乎模拟了一个实际的DF-我很想检查我们的答案是否产生了相同的结果，以及时间是如何变化的（如果你有一分钟：p）我现在离线，只在电话上。。。不幸的是，我们无法做到这一点。（：长度=字数或字符数？