Pandas 字符串列上的滚动和
我正在使用Python3和pandas版本“0.19.2” 我的意见如下:Pandas 字符串列上的滚动和,pandas,text,rolling-sum,Pandas,Text,Rolling Sum,我正在使用Python3和pandas版本“0.19.2” 我的意见如下: chat_id line 1 'Hi.' 1 'Hi, how are you?.' 1 'I'm well, thanks.' 2 'Is it going to rain?.' 2 'No, I don't think so.' 我想按“chat_id”分组,然后在“line”上做一些类似滚动求和的操作,以获得以下信息:
chat_id line
1 'Hi.'
1 'Hi, how are you?.'
1 'I'm well, thanks.'
2 'Is it going to rain?.'
2 'No, I don't think so.'
我想按“chat_id”分组,然后在“line”上做一些类似滚动求和的操作,以获得以下信息:
chat_id line conversation
1 'Hi.' 'Hi.'
1 'Hi, how are you?.' 'Hi. Hi, how are you?.'
1 'I'm well, thanks.' 'Hi. Hi, how are you?. I'm well, thanks.'
2 'Is it going to rain?.' 'Is it going to rain?.'
2 'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'
我相信df.groupby('chat_id')['line'].cumsum()只适用于数字列
我还尝试了df.groupby(by=['chat\u id'],as\u index=False)['line']。apply(list)获取完整对话中所有行的列表,但是我不知道如何解压缩该列表以创建“滚动求和”风格的对话列。对于我来说,如果需要,可以使用分隔符添加空格
:
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip())
print (df)
chat_id line new
0 1 Hi. Hi.
1 1 Hi, how are you?. Hi. Hi, how are you?.
2 1 I'm well, thanks. Hi. Hi, how are you?. I'm well, thanks.
3 2 Is it going to rain?. Is it going to rain?.
4 2 No, I don't think so. Is it going to rain?. No, I don't think so.
有趣
cumsum
在序列上调用时有效,但在groupby对象上调用时会引发错误。对我来说,这会导致:ValueError:无法从重复的Axis重新编制索引您的pandas版本是什么<代码>打印(pd.show_versions())。因为我无法模拟你的错误。我在值中测试了重复项,在索引中测试了重复项,所有这些在版本0.19.2
中都能正常工作。对不起,你说得对。我必须在df上重置_index(),然后它才能工作。如果对话之间有一个NaN
值(例如index 1
),我如何从cumsum
中排除它?谢谢@TotoLele-One ideadf['new']=df.dropna(subset=['line']).groupby('chat_id')['line']).apply(lambda x:'x+'.cumsum().str.strip())
df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
chat_id line \
0 1 Hi.
1 1 Hi, how are you?.
2 1 I'm well, thanks.
3 2 Is it going to rain?.
4 2 No, I don't think so.
new
0 'Hi.'
1 'Hi. Hi, how are you?.'
2 'Hi. Hi, how are you?. I'm well, thanks.'
3 'Is it going to rain?.'
4 'Is it going to rain?. No, I don't think so.'