Python 通过将其他列中的字符串连接到一个列中，根据特定列中的值合并数据帧的行_Python_Pandas_Dataframe

Python 通过将其他列中的字符串连接到一个列中，根据特定列中的值合并数据帧的行

python pandas dataframe

Python 通过将其他列中的字符串连接到一个列中，根据特定列中的值合并数据帧的行,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧： df1 = pd.DataFrame({ "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "Prep

我有一个如下所示的数据帧：

 df1 = pd.DataFrame({
                   "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
                   "Case":[1,1,2,2,2,3,4],
                   "Application":["MicrosoftWord", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftOutlook", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
                   "Activity_of_the_User":["SavingADocument", "SendingAnEmail", "SavingADocument", "SendingAnEmail", "SendingAnEmail", "SavingADocument", "SavingADocument"],
                   "Receiver_email_root":["None", "idatta91 adarandall larryjacob", "None", "idatta91 larryjacob"," vanessaHudgens prithakaur", "None", "None"],
                   "Receiever_email_domains":["None", "gmail yahoo", "None", "gmail", "gmail yahoo", "None", "None"],
                   "Receiver_email_count_Catg":["None", "Few", "None", "Double", "Double", "None", "None"],
                   "Subject":["None","Activity Report", "None", "Project Progress Report", "Project Progress Report 2", "None", "None"]
                   })

df2 = pd.DataFrame({"Case":[1,2,3,4],
               "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
               "Application":["MicrosoftWord MicrosoftOutlook", "MicrosoftWord MicrosoftOutlook MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
               "Activity_of_the_User":["SavingADocument SendingAnEmail","SavingADocument SendingAnEmail SendingAnEmail", "SavingADocument", "SavingADocument"],
               "Receiver_email_root":["idatta91 adarandall larryjacob", "idatta91 larryjacob vanessaHudgens prithakaur", "None", "None"],
               "Receiever_email_domains":["gmail yahoo","gmail gmail yahoo", "None", "None"],
               "Receiver_email_count_Catg":["Few", "Double Double", "None", "None"],
               "Subject":["Activity Report", "Project Progress Report Project Progress Report 2", "None", "None"]
               })

我想根据列

Case

合并数据帧的行。因此，如果两行或多行的

Case

列中的数字相同，则这些行的其他列的字符串将连接到一行中

同样，对于相同数量的案例，

业务流程活动

列中的值也相同。对于该列，我不想连接

业务流程活动

值，但只保留其中一个值，因为该列需要分类。我希望最终的数据帧如下所示：

 df1 = pd.DataFrame({
                   "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
                   "Case":[1,1,2,2,2,3,4],
                   "Application":["MicrosoftWord", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftOutlook", "MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
                   "Activity_of_the_User":["SavingADocument", "SendingAnEmail", "SavingADocument", "SendingAnEmail", "SendingAnEmail", "SavingADocument", "SavingADocument"],
                   "Receiver_email_root":["None", "idatta91 adarandall larryjacob", "None", "idatta91 larryjacob"," vanessaHudgens prithakaur", "None", "None"],
                   "Receiever_email_domains":["None", "gmail yahoo", "None", "gmail", "gmail yahoo", "None", "None"],
                   "Receiver_email_count_Catg":["None", "Few", "None", "Double", "Double", "None", "None"],
                   "Subject":["None","Activity Report", "None", "Project Progress Report", "Project Progress Report 2", "None", "None"]
                   })

df2 = pd.DataFrame({"Case":[1,2,3,4],
               "Business_Process_Activity" : ["SendingReportToManager", "SendingReportToManager", "PreparingAndSendingAgenda", "PreparingAndSendingAgenda"],
               "Application":["MicrosoftWord MicrosoftOutlook", "MicrosoftWord MicrosoftOutlook MicrosoftOutlook", "MicrosoftWord", "MicrosoftWord"], 
               "Activity_of_the_User":["SavingADocument SendingAnEmail","SavingADocument SendingAnEmail SendingAnEmail", "SavingADocument", "SavingADocument"],
               "Receiver_email_root":["idatta91 adarandall larryjacob", "idatta91 larryjacob vanessaHudgens prithakaur", "None", "None"],
               "Receiever_email_domains":["gmail yahoo","gmail gmail yahoo", "None", "None"],
               "Receiver_email_count_Catg":["Few", "Double Double", "None", "None"],
               "Subject":["Activity Report", "Project Progress Report Project Progress Report 2", "None", "None"]
               })

如果字符串与“None”列合并，则应删除“None”字符串，因为该值不再为空。当行合并为一行时，应删除案例列的重复编号

我该怎么做？提前谢谢

思想是删除每个组的

None

值和

None

字符串，将它们连接在一起，最后将空字符串替换为

None

：

df = (df1.groupby('Case')
         .agg(lambda x: ' '.join(x[x.ne('None') & x.notna()]))
         .where(lambda x: x.astype(bool), None)
         .reset_index())

另一个具有自定义功能的解决方案：

def f(x):
   y = x[x.ne('None') & x.notna()]
   return None if y.empty else ' '.join(y)

df = df1.groupby('Case').agg(f).reset_index()

想法是删除每个组的

None

值和

None

字符串，连接在一起，最后将空字符串替换为

None

：

df = (df1.groupby('Case')
         .agg(lambda x: ' '.join(x[x.ne('None') & x.notna()]))
         .where(lambda x: x.astype(bool), None)
         .reset_index())

另一个具有自定义功能的解决方案：

def f(x):
   y = x[x.ne('None') & x.notna()]
   return None if y.empty else ' '.join(y)

df = df1.groupby('Case').agg(f).reset_index()

使用：

像这样的事情，不是机器学习或nlp问题，请不要发送不相关的标签（删除）。像这样的事情，不是机器学习或nlp问题，请不要发送不相关的标签（删除）。嗨！你的回答很有帮助！但我忘了在示例数据框架中添加一个重要的列，名为“业务流程活动”。对于那个专栏，我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗？提前感谢！：）问题是您的答案在列中创建了多个“无”值，其中合并行的所有值都具有“无”值。@Indy我想您想要字符串repr为“无”值，请编辑答案。您好！你的回答很有帮助！但我忘了在示例数据框架中添加一个重要的列，名为“业务流程活动”。对于那个专栏，我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗？提前感谢！：）问题是您的答案在列中创建了多个“无”值，其中合并行的所有值都具有“无”值。@Indy我想您想要字符串repr为“无”值，请编辑答案。您好！你的回答很有帮助！但我忘了在示例数据框架中添加一个重要的列，名为“业务流程活动”。对于那个专栏，我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗？提前感谢！：）@Indy-您可以将

df1.groupby（'Case'）

更改为

df1.groupby（['Business\u Process\u Activity'，'Case']）

Hi！你的回答很有帮助！但我忘了在示例数据框架中添加一个重要的列，名为“业务流程活动”。对于那个专栏，我希望它以不同的方式发生。我再次更新了这个问题。你能再复习一遍吗？提前感谢！：）@Indy-您可以将

df1.groupby（'Case'）

更改为

df1.groupby（['Business\u Process\u Activity'，'Case']）