Python 将两个数据帧与公共值合并,公共值在一个数据帧中显示为列,在另一个数据帧中显示为行
我有一个dataframe,其中包含数百列作为客户端ID,一行包含每个客户端ID的总票证数,如下所示: (df1是原始csv文件多次转换的结果) 另一个数据框有两列,一列是account_id,另一列是client_id,如下所示:Python 将两个数据帧与公共值合并,公共值在一个数据帧中显示为列,在另一个数据帧中显示为行,python,pandas,merge,jupyter-notebook,match,Python,Pandas,Merge,Jupyter Notebook,Match,我有一个dataframe,其中包含数百列作为客户端ID,一行包含每个客户端ID的总票证数,如下所示: (df1是原始csv文件多次转换的结果) 另一个数据框有两列,一列是account_id,另一列是client_id,如下所示: df2 +------------+-----------+ | account_id | client_id | +------------+-----------+ | 4char | 4 | +------------+-------
df2
+------------+-----------+
| account_id | client_id |
+------------+-----------+
| 4char | 4 |
+------------+-----------+
| 3char | 5 |
+------------+-----------+
| 2char | 30 |
+------------+-----------+
| 16char | 9 |
+------------+-----------+
| 17char | 100 |
+------------+-----------+
df
+------------+-----------+---------------+
| account_id | client_id | total_tickets |
+------------+-----------+---------------+
| 4char | 4 | null
+------------+-----------+---------------+
| 3char | 5 | 40
+------------+-----------+---------------+
| 2char | 30 | 122
+------------+-----------+---------------+
| 16char | 9 | null
+------------+-----------+---------------+
| 17char | 100 | 13
+------------+-----------+---------------+
我希望有一个包含3列account\u id、client\u id和total\u tickets的文件,如下所示:
df2
+------------+-----------+
| account_id | client_id |
+------------+-----------+
| 4char | 4 |
+------------+-----------+
| 3char | 5 |
+------------+-----------+
| 2char | 30 |
+------------+-----------+
| 16char | 9 |
+------------+-----------+
| 17char | 100 |
+------------+-----------+
df
+------------+-----------+---------------+
| account_id | client_id | total_tickets |
+------------+-----------+---------------+
| 4char | 4 | null
+------------+-----------+---------------+
| 3char | 5 | 40
+------------+-----------+---------------+
| 2char | 30 | 122
+------------+-----------+---------------+
| 16char | 9 | null
+------------+-----------+---------------+
| 17char | 100 | 13
+------------+-----------+---------------+
到目前为止,我已经达到了这一点:
我已经创建了一个函数,在两个数据帧上都使用iterrows(),如果在df1的列中找到了df2的客户机id,请使用isin()函数进行检查,接下来我在df2上使用assign()函数添加一个新的列total_tickets
f1 = df1, f2 = df2
def populating_df(f1, f2):
for org_nr in f2.iterrows():
for col in f1.iterrows():
matched_org_nr = f2.client_id.isin(f1.columns)
if matched_org_nr.any() == True:
sum_of_tickets_per_col = matched_org_nr
# create a new column in f2 file with the values of total_tickets for each org number matched
f2 = f2.loc[:].assign(Total_Tickets=sum_of_tickets_per_col)
return f2
结果我得到了这张表:
+------------+-----------+---------------+
| account_id | client_id | total_tickets |
+------------+-----------+---------------+
| 4char | 4 |False
+------------+-----------+---------------+
| 3char | 5 | True
+------------+-----------+---------------+
| 2char | 30 | True
+------------+-----------+---------------+
| 16char | 9 | False
+------------+-----------+---------------+
| 17char | 100 | True
+------------+-----------+---------------+
如果有人对如何解决这个问题有任何建议,我会很高兴的您可以使用
pd.merge
df = pd.merge(df1, df2, on="client_id", how='outer')
首先,我们需要融化df1,以便对每一行进行观察 然后可以使用outer合并以从两列中获取键
df_melt = pd.melt(df1,var_name='client_id',value_name='total_tickets')
df3 = pd.merge(df_melt ,df2,on=['client_id'],how='outer')
#make sure dtypes are the same.
#df_melt ['client_id'] = df_melt ['client_id'].astype(int)
df3 = df3[["account_id", "client_id", "total_tickets"]].sort_values(
"account_id", ascending=False
)
print(df3)
account_id client_id total_tickets
3 4char 4 NaN
1 3char 5 40.0
0 2char 30 122.0
2 17char 100 13.0
4 16char 9 NaN
merge
是关键,但您必须首先转置初始数据帧,并进行一些修饰性更改,例如重置其索引并提供相关列名:
这种转变可以是:
df1.rename({0: 'total_tickets'}).T.rename_axis('client_id').reset_index()
给予:
client_index total_tickets
0 30 122
1 5 40
2 100 13
完成此操作后,合并将变得非常简单:
result = df2.merge(df1.rename({0: 'total_tickets'}).T.rename_axis('client_id').reset_index(),
on='client_id', how='left')
按预期给予:
account_id client_id total_tickets
0 4char 4 NaN
1 3char 5 40.0
2 2char 30 122.0
3 16char 9 NaN
4 17char 100 13.0
对于df1,
client\u id
是标题吗?是@datanovel这不起作用您需要在合并之前进行一些处理合并不发生时,给我一个值错误ValueError:您正在尝试合并object和int64列。如果你想继续,你应该使用pd.concat
@Denisa你看到这行了吗\df\u melt['client\u id']=df\u melt['client\u id'].astype(int)
运行这个然后运行合并我不确定你的列最初是字符串还是整数。我得到了df\u melt的结果,看起来不错,但是合并后会出现一个关键错误,因为您print(df2.dtypes)
和print(df\u melt.dtypes)
并将结果发布到您的主要问题中?如果使用了错误的df,它会打印正确的结果,检查打印df3的最后一步