Python 通过将其他列中的字符串连接到一个列中,根据特定列中的值合并数据帧的行

Python 通过将其他列中的字符串连接到一个列中,根据特定列中的值合并数据帧的行,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧: df1 = pd.DataFrame({ "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "Prep

我有一个如下所示的数据帧:

 df1 = pd.DataFrame({
                   "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
                   "Case":[1,1,2,2,2,3,4],
                   "Application":["MicrosoftWord", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftOutlook", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
                   "Activity_of_the_User":["SavingADocument", "SendingAnEmail", "SavingADocument", "SendingAnEmail", "SendingAnEmail", "SavingADocument", "SavingADocument"],
                   "Receiver_email_root":["None", "idatta91 adarandall larryjacob", "None", "idatta91 larryjacob"," vanessaHudgens prithakaur", "None", "None"],
                   "Receiever_email_domains":["None", "gmail yahoo", "None", "gmail", "gmail yahoo", "None", "None"],
                   "Receiver_email_count_Catg":["None", "Few", "None", "Double", "Double", "None", "None"],
                   "Subject":["None","Activity Report", "None", "Project Progress Report", "Project Progress Report 2", "None", "None"]
                   })
df2 = pd.DataFrame({"Case":[1,2,3,4],
               "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
               "Application":["MicrosoftWord MicrosoftOutlook", "MicrosoftWord MicrosoftOutlook MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
               "Activity_of_the_User":["SavingADocument SendingAnEmail","SavingADocument SendingAnEmail SendingAnEmail", "SavingADocument", "SavingADocument"],
               "Receiver_email_root":["idatta91 adarandall larryjacob", "idatta91 larryjacob vanessaHudgens prithakaur", "None", "None"],
               "Receiever_email_domains":["gmail yahoo","gmail gmail yahoo", "None", "None"],
               "Receiver_email_count_Catg":["Few", "Double Double", "None", "None"],
               "Subject":["Activity Report", "Project Progress Report Project Progress Report 2", "None", "None"]
               })
我想根据列
Case
合并数据帧的行。因此,如果两行或多行的
Case
列中的数字相同,则这些行的其他列的字符串将连接到一行中

同样,对于相同数量的案例,
业务流程活动
列中的值也相同。对于该列,我不想连接
业务流程活动
值,但只保留其中一个值,因为该列需要分类。我希望最终的数据帧如下所示:

 df1 = pd.DataFrame({
                   "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
                   "Case":[1,1,2,2,2,3,4],
                   "Application":["MicrosoftWord", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftOutlook", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
                   "Activity_of_the_User":["SavingADocument", "SendingAnEmail", "SavingADocument", "SendingAnEmail", "SendingAnEmail", "SavingADocument", "SavingADocument"],
                   "Receiver_email_root":["None", "idatta91 adarandall larryjacob", "None", "idatta91 larryjacob"," vanessaHudgens prithakaur", "None", "None"],
                   "Receiever_email_domains":["None", "gmail yahoo", "None", "gmail", "gmail yahoo", "None", "None"],
                   "Receiver_email_count_Catg":["None", "Few", "None", "Double", "Double", "None", "None"],
                   "Subject":["None","Activity Report", "None", "Project Progress Report", "Project Progress Report 2", "None", "None"]
                   })
df2 = pd.DataFrame({"Case":[1,2,3,4],
               "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
               "Application":["MicrosoftWord MicrosoftOutlook", "MicrosoftWord MicrosoftOutlook MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
               "Activity_of_the_User":["SavingADocument SendingAnEmail","SavingADocument SendingAnEmail SendingAnEmail", "SavingADocument", "SavingADocument"],
               "Receiver_email_root":["idatta91 adarandall larryjacob", "idatta91 larryjacob vanessaHudgens prithakaur", "None", "None"],
               "Receiever_email_domains":["gmail yahoo","gmail gmail yahoo", "None", "None"],
               "Receiver_email_count_Catg":["Few", "Double Double", "None", "None"],
               "Subject":["Activity Report", "Project Progress Report Project Progress Report 2", "None", "None"]
               })
如果字符串与“None”列合并,则应删除“None”字符串,因为该值不再为空。当行合并为一行时,应删除案例列的重复编号


我该怎么做?提前谢谢

思想是删除每个组的
None
值和
None
字符串,将它们连接在一起,最后将空字符串替换为
None

df = (df1.groupby('Case')
         .agg(lambda x: ' '.join(x[x.ne('None') & x.notna()]))
         .where(lambda x: x.astype(bool), None)
         .reset_index())
另一个具有自定义功能的解决方案:

def f(x):
   y = x[x.ne('None') & x.notna()]
   return None if y.empty else ' '.join(y)

df = df1.groupby('Case').agg(f).reset_index()

想法是删除每个组的
None
值和
None
字符串,连接在一起,最后将空字符串替换为
None

df = (df1.groupby('Case')
         .agg(lambda x: ' '.join(x[x.ne('None') & x.notna()]))
         .where(lambda x: x.astype(bool), None)
         .reset_index())
另一个具有自定义功能的解决方案:

def f(x):
   y = x[x.ne('None') & x.notna()]
   return None if y.empty else ' '.join(y)

df = df1.groupby('Case').agg(f).reset_index()
使用:


使用:



像这样的事情,不是机器学习或nlp问题,请不要发送不相关的标签(删除)。像这样的事情,不是机器学习或nlp问题,请不要发送不相关的标签(删除)。嗨!你的回答很有帮助!但我忘了在示例数据框架中添加一个重要的列,名为“业务流程活动”。对于那个专栏,我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗?提前感谢!:)问题是您的答案在列中创建了多个“无”值,其中合并行的所有值都具有“无”值。@Indy我想您想要字符串repr为“无”值,请编辑答案。您好!你的回答很有帮助!但我忘了在示例数据框架中添加一个重要的列,名为“业务流程活动”。对于那个专栏,我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗?提前感谢!:)问题是您的答案在列中创建了多个“无”值,其中合并行的所有值都具有“无”值。@Indy我想您想要字符串repr为“无”值,请编辑答案。您好!你的回答很有帮助!但我忘了在示例数据框架中添加一个重要的列,名为“业务流程活动”。对于那个专栏,我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗?提前感谢!:)@Indy-您可以将
df1.groupby('Case')
更改为
df1.groupby(['Business\u Process\u Activity','Case'])
Hi!你的回答很有帮助!但我忘了在示例数据框架中添加一个重要的列,名为“业务流程活动”。对于那个专栏,我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗?提前感谢!:)@Indy-您可以将
df1.groupby('Case')
更改为
df1.groupby(['Business\u Process\u Activity','Case'])