Python 在dataframe中组合多行并按列对其分组_Python_Pandas Groupby

Python 在dataframe中组合多行并按列对其分组

python

Python 在dataframe中组合多行并按列对其分组,python,pandas-groupby,Python,Pandas Groupby,这就是我的pandas数据框的外观。我的要求是根据用户类型和聊天顺序编号组合话语列，并根据案例id和交互id对它们进行分组 Case_ID Interaction_ID Chat_Sequence_Number User_Type Utterances 1 123 3 Person1 are 1 123

这就是我的pandas数据框的外观。我的要求是根据用户类型和聊天顺序编号组合话语列，并根据案例id和交互id对它们进行分组

       Case_ID    Interaction_ID  Chat_Sequence_Number User_Type        Utterances
          1          123                   3           Person1            are
          1          123                   4           Person1              you
          1          123                   1           Person1              Hello,
          1          123                   2           Person1              how
          1          123                   5           Person1              feeling?
          1          123                   6           Person2              I'm
          1          123                   6           Person2              fine.

是否有一种方法可以根据上述要求创建新的数据框架。我的最终输出应该是这样的

案例\ ID交互\ ID用户\类型话语 1 123人1你好，感觉怎么样？

1123人1我很好。

您可以通过几个步骤来完成：

按聊天室\序列\编号排序

按案例ID、交互ID和用户类型分组

使用.apply（）连接字符串

这项工作在下面的一行中完成

import pandas as pd

# Create the dataframe
df = pd.DataFrame(columns=['Case_ID','Interaction_ID','Chat_Sequence_Number','User_Type','Utterances'])
df['Utterances'] = 'are','you','Hello','how','feeling?',"I'm",'fine.'
df['User_Type'] = ['Person1']*5+['Person2']*2
df['Chat_Sequence_Number'] = 3,4,1,2,5,6,7
df['Case_ID'] = 1
df['Interaction_ID'] = 123

# Do the grouping
output = df.sort_values(['Chat_Sequence_Number']).groupby(['Case_ID','Interaction_ID','User_Type'])['Utterances'].apply(' '.join).reset_index()
print(output)

输出：

   Case_ID  Interaction_ID User_Type                  Utterances
0        1             123   Person1  Hello how are you feeling?
1        1             123   Person2                    I'm fine.

您可以通过以下几个步骤完成此操作：

按聊天室\序列\编号排序

按案例ID、交互ID和用户类型分组

使用.apply（）连接字符串

这项工作在下面的一行中完成

import pandas as pd

# Create the dataframe
df = pd.DataFrame(columns=['Case_ID','Interaction_ID','Chat_Sequence_Number','User_Type','Utterances'])
df['Utterances'] = 'are','you','Hello','how','feeling?',"I'm",'fine.'
df['User_Type'] = ['Person1']*5+['Person2']*2
df['Chat_Sequence_Number'] = 3,4,1,2,5,6,7
df['Case_ID'] = 1
df['Interaction_ID'] = 123

# Do the grouping
output = df.sort_values(['Chat_Sequence_Number']).groupby(['Case_ID','Interaction_ID','User_Type'])['Utterances'].apply(' '.join).reset_index()
print(output)

输出：

   Case_ID  Interaction_ID User_Type                  Utterances
0        1             123   Person1  Hello how are you feeling?
1        1             123   Person2                    I'm fine.

这回答了你的问题吗？请尝试改进输入和输出的格式，并解释您的尝试或发现。这是否回答了您的问题？请尝试改进输入和输出的格式，并解释您的尝试或发现。