Python 在删除行的同时，将符合条件的数据框中的行中的列合并_Python_Pandas_Dataframe

Python 在删除行的同时，将符合条件的数据框中的行中的列合并

python pandas dataframe

Python 在删除行的同时，将符合条件的数据框中的行中的列合并,python,pandas,dataframe,Python,Pandas,Dataframe,背景信息这个问题与我以前的经历密切相关。不幸的是，虽然只是一个普通的例子，但它不够具体，无法应用于我的个人问题。这就是为什么这个问题更具体示例-代码片段 import pandas as pd import numpy as np inp = [{'ID_Code':1,'information 1':[10,22,44],'information 2':[1,0,1]}, {'ID_Code':2,'information 1':[400,323],'information

背景信息

这个问题与我以前的经历密切相关。不幸的是，虽然只是一个普通的例子，但它不够具体，无法应用于我的个人问题。这就是为什么这个问题更具体

示例-代码片段

import pandas as pd
import numpy as np

inp = [{'ID_Code':1,'information 1':[10,22,44],'information 2':[1,0,1]},
       {'ID_Code':2,'information 1':[400,323],'information 2':[1,1]},
       {'ID_Code':2,'information 1':[243],'information 2':[0]},
       {'ID_Code':2,'information 1':[333,555],'information 2':[0]},
       {'ID_Code':3,'information 1':[12,27,43,54],'information 2':[1,0,1,1]},
       {'ID_Code':3,'information 1':[31,42,13,14],'information 2':[1,0,0,0]},
       {'ID_Code':3,'information 1':[14,24,34,14],'information 2':[1,0,1,1]},
       {'ID_Code':4,'information 1':[15,25,33,44],'information 2':[0,0,0,1]},
       {'ID_Code':5,'information 1':[12,12,13,14],'information 2':[1,1,1,0]},
       {'ID_Code':5,'information 1':[12,12,13,24],'information 2':[1,0,1,1]},
       {'ID_Code':5,'information 1':[21,22,23,14],'information 2':[1,1,1,1]},
       {'ID_Code':6,'information 1':[10,12,23,4],'information 2':[1,0,1,0]},
       {'ID_Code':7,'information 1':[112,212,143,124],'information 2':[0,0,0,0]},
       {'ID_Code':7,'information 1':[211,321],'information 2':[1]},
       {'ID_Code':7,'information 1':[431],'information 2':[1,0]},
       {'ID_Code':8,'information 1':[1,2,3,4],'information 2':[1,0,0,1]}]


df = pd.DataFrame(inp)

df1=df.groupby("ID_Code")["information 1"].apply(list).to_frame()
df2=df.groupby("ID_Code")["information 2"].apply(list).to_frame()
df3=pd.concat([df1, df2],axis=1, sort=False)

ID_Code    information 1                                            information 2
1          [[10, 22, 44]]                                           [[1, 0, 1]]
2          [[400, 323], [243], [333, 555]]                          [[1, 1], [0], [0]]
3          [[12, 27, 43, 54], [31, 42, 13, 14], [14, 24, 34, 14]]   [[1, 0, 1, 1], [1, 0, 0, 0], [1, 0, 1, 1]]
4          [[15, 25, 33, 44]]                                       [[0, 0, 0, 1]]
5          [[12, 12, 13, 14], [12, 12, 13, 24], [21, 22, 23, 14]]   [[1, 1, 1, 0], [1, 0, 1, 1], [1, 1, 1, 1]]
6          [[10, 12, 23, 4]]                                        [[1, 0, 1, 0]]
7          [[112, 212, 143, 124], [211, 321], [431]]                [[0, 0, 0, 0], [1], [1, 0]]
8          [[1, 2, 3, 4]]                                           [[1, 0, 0, 1]]

输出

import pandas as pd
import numpy as np

inp = [{'ID_Code':1,'information 1':[10,22,44],'information 2':[1,0,1]},
       {'ID_Code':2,'information 1':[400,323],'information 2':[1,1]},
       {'ID_Code':2,'information 1':[243],'information 2':[0]},
       {'ID_Code':2,'information 1':[333,555],'information 2':[0]},
       {'ID_Code':3,'information 1':[12,27,43,54],'information 2':[1,0,1,1]},
       {'ID_Code':3,'information 1':[31,42,13,14],'information 2':[1,0,0,0]},
       {'ID_Code':3,'information 1':[14,24,34,14],'information 2':[1,0,1,1]},
       {'ID_Code':4,'information 1':[15,25,33,44],'information 2':[0,0,0,1]},
       {'ID_Code':5,'information 1':[12,12,13,14],'information 2':[1,1,1,0]},
       {'ID_Code':5,'information 1':[12,12,13,24],'information 2':[1,0,1,1]},
       {'ID_Code':5,'information 1':[21,22,23,14],'information 2':[1,1,1,1]},
       {'ID_Code':6,'information 1':[10,12,23,4],'information 2':[1,0,1,0]},
       {'ID_Code':7,'information 1':[112,212,143,124],'information 2':[0,0,0,0]},
       {'ID_Code':7,'information 1':[211,321],'information 2':[1]},
       {'ID_Code':7,'information 1':[431],'information 2':[1,0]},
       {'ID_Code':8,'information 1':[1,2,3,4],'information 2':[1,0,0,1]}]


df = pd.DataFrame(inp)

df1=df.groupby("ID_Code")["information 1"].apply(list).to_frame()
df2=df.groupby("ID_Code")["information 2"].apply(list).to_frame()
df3=pd.concat([df1, df2],axis=1, sort=False)

ID_Code    information 1                                            information 2
1          [[10, 22, 44]]                                           [[1, 0, 1]]
2          [[400, 323], [243], [333, 555]]                          [[1, 1], [0], [0]]
3          [[12, 27, 43, 54], [31, 42, 13, 14], [14, 24, 34, 14]]   [[1, 0, 1, 1], [1, 0, 0, 0], [1, 0, 1, 1]]
4          [[15, 25, 33, 44]]                                       [[0, 0, 0, 1]]
5          [[12, 12, 13, 14], [12, 12, 13, 24], [21, 22, 23, 14]]   [[1, 1, 1, 0], [1, 0, 1, 1], [1, 1, 1, 1]]
6          [[10, 12, 23, 4]]                                        [[1, 0, 1, 0]]
7          [[112, 212, 143, 124], [211, 321], [431]]                [[0, 0, 0, 0], [1], [1, 0]]
8          [[1, 2, 3, 4]]                                           [[1, 0, 0, 1]]

其中ID_代码不再是一列，而是索引。这是我在上一篇文章中没有详细说明的问题

任务

对于给定的数据帧“df3”，要去掉ID_Code=1并将其信息存储在ID_Code=3中，去掉ID_Code=5和ID_Code=7并将该信息存储在ID_Code=2中，数据帧如下所示：

ID_Code    information 1                                                                                                                       information 2
2          [[400, 323], [243], [333, 555], [12, 12, 13, 14], [12, 12, 13, 24], [21, 22, 23, 14], [112, 212, 143, 124], [211, 321], [431]]     [[1, 1], [0], [0], [1, 1, 1, 0], [1, 0, 1, 1], [1, 1, 1, 1], [0, 0, 0, 0], [1], [1, 0]]
3          [[12, 27, 43, 54], [31, 42, 13, 14], [14, 24, 34, 14], [10, 22, 44]]                                                               [[1, 0, 1, 1], [1, 0, 0, 0], [1, 0, 1, 1], [1, 0, 1]]
4          [[15, 25, 33, 44]]                                                                                                                 [[0, 0, 0, 1]]
6          [[10, 12, 23, 4]]                                                                                                                  [[1, 0, 1, 0]]
8          [[1, 2, 3, 4]]                                                                                                                     [[1, 0, 0, 1]]

如果有人能帮我解决这个问题，那将是一个巨大的帮助。

上一篇文章中给出的答案对我来说很有用，对索引做了一些修改

如问题所述，问题在于

'ID\u code'

是索引而不是列。因此，我的解决方案是添加一个具有唯一ID\u代码的列。为此，我找到了两种可能的方法
解决方案1
将
.unique（）
与
pd.Dataframe（）
as.unique（）结合使用将返回一个numpy.ndarray，该数组必须再次转换为数据帧

df4 = pd.DataFrame(df['ID_Code'].unique(),columns=['ID_Code'],index=df['ID_Code'].unique()) df5 = pd.concat([df4,df3],axis=1) col = 'ID_Code' cond = [df5[col].eq(1), df5[col].isin([5,7])] outputs = [3,2] df5[col] = np.select(cond,outputs,default=df5[col]) df6 = df5.groupby(col).sum()
解决方案2
使用
.reset\u index（）
将ID\u代码从索引中移出到单独的列中

df3 = df3.reset_index() col = 'ID_Code' cond = [df3[col].eq(1), df3[col].isin([5,7])] outputs = [3,2] df3[col] = np.select(cond,outputs,default=df3[col]) df4 = df3.groupby(col).sum()

这对我来说很有效，上一篇文章的答案是，对索引做了一些修改
如问题所述，问题在于
'ID\u code'
是索引而不是列。因此，我的解决方案是添加一个具有唯一ID\u代码的列。为此，我找到了两种可能的方法
解决方案1
将
.unique（）
与
pd.Dataframe（）
as.unique（）结合使用将返回一个numpy.ndarray，该数组必须再次转换为数据帧

df4 = pd.DataFrame(df['ID_Code'].unique(),columns=['ID_Code'],index=df['ID_Code'].unique()) df5 = pd.concat([df4,df3],axis=1) col = 'ID_Code' cond = [df5[col].eq(1), df5[col].isin([5,7])] outputs = [3,2] df5[col] = np.select(cond,outputs,default=df5[col]) df6 = df5.groupby(col).sum()
解决方案2
使用
.reset\u index（）
将ID\u代码从索引中移出到单独的列中

df3 = df3.reset_index() col = 'ID_Code' cond = [df3[col].eq(1), df3[col].isin([5,7])] outputs = [3,2] df3[col] = np.select(cond,outputs,default=df3[col]) df4 = df3.groupby(col).sum()

选择ID并将其信息存储在另一个ID中的模式是什么？不幸的是，没有模式，在我的原始数据集中，我有大约15个需要删除的“ID”，并且在ID中恢复的信息与删除的信息有些“相关”。我还必须手动选择最相关的ID。这就是为什么它需要是一个代码，可以专门选择要提取信息并删除的行，以及必须将信息添加到其中的行。@RobertRedisch如何选择ID_code=1的行添加到ID_code=3的行？它背后的逻辑是什么？正如我之前所写的，没有规则决定必须合并哪些数据集，我必须在原始数据集中手动选择它。我在这里称之为ID_Code的列包含相互关联的代码。例如邮政编码。你可以想象，出于某种原因，我需要删除一些邮政编码，并且我必须将信息添加到地理位置靠近的邮政编码中，因此必须手动选择。简而言之：我需要能够专门选择一行来提取和删除信息，以及一行必须添加信息的行。选择ID并将其信息存储在另一个ID中的模式是什么？不幸的是，没有模式，在我的原始数据集中，我有大约15个“ID”需要删除的信息，以及在ID中恢复的与删除的信息有些“相关”的信息。我还必须手动选择最相关的ID。这就是为什么它需要是一个代码，可以专门选择要提取信息并删除的行，以及必须将信息添加到其中的行。@RobertRedisch如何选择ID_code=1的行添加到ID_code=3的行？它背后的逻辑是什么？正如我之前所写的，没有规则决定必须合并哪些数据集，我必须在原始数据集中手动选择它。我在这里称之为ID_Code的列包含相互关联的代码。例如邮政编码。你可以想象，出于某种原因，我需要删除一些邮政编码，并且我必须将信息添加到地理位置靠近的邮政编码中，因此必须手动选择。简言之：我需要能够特别选择一行来提取和删除信息，以及一行必须将信息添加到其中。