Python 合并两列以消除重复行_Python_Pandas

Python 合并两列以消除重复行

python pandas

Python 合并两列以消除重复行,python,pandas,Python,Pandas,我目前的情况是，我有一个数据帧，看起来像这样 id tp dt amt 0 1 CR 2017 94678.0 1 1 CR 2018 13508.0 2 1 DR 2017 78671.0 3 1 DR 2018 13797.0 4 2 CR 2017 111417.0 5 2 CR

我目前的情况是，我有一个数据帧，看起来像这样

          id  tp    dt        amt
0          1   CR  2017    94678.0
1          1   CR  2018    13508.0
2          1   DR  2017    78671.0
3          1   DR  2018    13797.0
4          2   CR  2017   111417.0
5          2   CR  2018    21479.0
6          2   DR  2017    95266.0
7          2   DR  2018     1864.0

我试图实现的是组合两列的值，tp和dt，并将其用作amt的列名，以潜在地消除具有相同id的多行

          id     CR2017   CR2018   DR2017  DR2018
0          1    94678.0  13508.0  78671.0  13797.0
1          2   111417.0  21479.0  95266.0   1864.0

我想知道这是否可能？我已经玩了一个小时重置索引、设置索引和透视表，但仍然没有运气请提前感谢，我们将非常感谢您的帮助

用于连接柱和重塑：

df = df.set_index(['id', df['tp'] + df['dt'].astype(str)])['amt'].unstack().reset_index()
print (df)
   id    CR2017   CR2018   DR2017   DR2018
0   1   94678.0  13508.0  78671.0  13797.0
1   2  111417.0  21479.0  95266.0   1864.0

或创建新列：

df['new'] = df['tp'] + df['dt'].astype(str)
df = df.set_index(['id', 'new'])['amt'].unstack().rename_axis(None, axis=1).reset_index()
print (df)
   id    CR2017   CR2018   DR2017   DR2018
0   1   94678.0  13508.0  78671.0  13797.0
1   2  111417.0  21479.0  95266.0   1864.0

但如果得到：

ValueError:索引包含重复的条目，无法重塑

这意味着存在重复的

id

，具有如下joine对：

print (df)
   id  tp    dt       amt
0   1  CR  2017   94678.0 <-dupe 1 CR 2017
0   1  CR  2017   10000.0 <-dupe 1 CR 2017
1   1  CR  2018   13508.0
2   1  DR  2017   78671.0
3   1  DR  2018   13797.0
4   2  CR  2017  111417.0
5   2  CR  2018   21479.0
6   2  DR  2017   95266.0
7   2  DR  2018    1864.0

或使用默认值

aggfunc='mean'

：

df = df.pivot_table(index='id',columns=df['tp'] + df['dt'].astype(str), values= 'amt').reset_index()

添加一些解释。首先创建一列，将dt和tp中的值连接起来。然后删除这些单独的列，因为您不需要它们。发布您在id和tpdt上执行groupby，它将对tp和dt的唯一对的amt值求和。发布您可以通过tpdt将其透视，使其成为列标题。

您甚至可以探索相同的取消堆叠功能。

一种方法是使用all-in-One with:with-default

aggfunc

numpy.mean

with-and

输出：

这个很好用。谢谢你的解释！至少一行解释有望赢得更多选票

df = df.pivot_table(index='id',columns=df['tp'] + df['dt'].astype(str), values= 'amt').reset_index()

df['tpdt'] = df['tp'].astype(str) + df['dt'].astype(str)
del df['tp']
del df['dt']
df = df.groupby(['id','tpdt'],as_index=False).sum()
df = df.reset_index().pivot(columns='tpdt', index='id', values='amt')

pd.pivot_table(df,index='id', columns = df.tp.astype(str).str.cat(df.dt.astype(str)), values="amt").reset_index(col_level=1).rename_axis(None, axis=1)

    id  CR2017     CR2018    DR2017     DR2018
0   1   94678.0    13508.0   78671.0    13797.0
1   2   111417.0   21479.0   95266.0    1864.0