Python 在Pandas中高效地合并列和行两个数据帧
我有一个详细的Python 在Pandas中高效地合并列和行两个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个详细的事务数据数据框,如下所示 df_col = pd.DataFrame({'SQ':[1,1,2], 'City':['A','A','B'], 'Date':['7-1-2020','7-2-2020','7-1-2020'], 'Loc 1':[40,21,27], 'Loc 2':[37,40,14], 'L
事务数据
数据框,如下所示
df_col = pd.DataFrame({'SQ':[1,1,2],
'City':['A','A','B'],
'Date':['7-1-2020','7-2-2020','7-1-2020'],
'Loc 1':[40,21,27],
'Loc 2':[37,40,14],
'Loc 3':[49,38,36],
'Loc 4':[20,14,18],
'Loc 5':[48,27,36]})
+----+------+----------+-------+-------+-------+-------+-------+
| SQ | City | Date | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----+------+----------+-------+-------+-------+-------+-------+
| 1 | A | 7-1-2020 | 40 | 37 | 49 | 20 | 48 |
+----+------+----------+-------+-------+-------+-------+-------+
| 1 | A | 7-2-2020 | 21 | 40 | 38 | 14 | 27 |
+----+------+----------+-------+-------+-------+-------+-------+
| 2 | B | 7-1-2020 | 27 | 14 | 36 | 18 | 36 |
+----+------+----------+-------+-------+-------+-------+-------+
+------------+--------------+
| LocationNo | LocationType |
+------------+--------------+
| Loc 1 | Class A |
+------------+--------------+
| Loc 2 | Class A |
+------------+--------------+
| Loc 3 | Class B |
+------------+--------------+
| Loc 4 | Class C |
+------------+--------------+
| Loc 5 | Class C |
+------------+--------------+
df_row = pd.DataFrame({'LocationNo':['Loc 1','Loc 2','Loc 3','Loc 4','Loc 5'],
'LocationType':['Class A', 'Class A', 'Class B', 'Class C', 'Class C']
})
另外,我还有一个单独的位置数据
数据框,如下所示
df_col = pd.DataFrame({'SQ':[1,1,2],
'City':['A','A','B'],
'Date':['7-1-2020','7-2-2020','7-1-2020'],
'Loc 1':[40,21,27],
'Loc 2':[37,40,14],
'Loc 3':[49,38,36],
'Loc 4':[20,14,18],
'Loc 5':[48,27,36]})
+----+------+----------+-------+-------+-------+-------+-------+
| SQ | City | Date | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----+------+----------+-------+-------+-------+-------+-------+
| 1 | A | 7-1-2020 | 40 | 37 | 49 | 20 | 48 |
+----+------+----------+-------+-------+-------+-------+-------+
| 1 | A | 7-2-2020 | 21 | 40 | 38 | 14 | 27 |
+----+------+----------+-------+-------+-------+-------+-------+
| 2 | B | 7-1-2020 | 27 | 14 | 36 | 18 | 36 |
+----+------+----------+-------+-------+-------+-------+-------+
+------------+--------------+
| LocationNo | LocationType |
+------------+--------------+
| Loc 1 | Class A |
+------------+--------------+
| Loc 2 | Class A |
+------------+--------------+
| Loc 3 | Class B |
+------------+--------------+
| Loc 4 | Class C |
+------------+--------------+
| Loc 5 | Class C |
+------------+--------------+
df_row = pd.DataFrame({'LocationNo':['Loc 1','Loc 2','Loc 3','Loc 4','Loc 5'],
'LocationType':['Class A', 'Class A', 'Class B', 'Class C', 'Class C']
})
现在,我的任务是将df_col
的列合并到df_row
中的行中,并求和这些值。
合并垂直到水平
即列到行
我想要的输出如下
+----+------+----------+---------+---------+---------+
| SQ | City | Date | Class A | Class B | Class C |
+----+------+----------+---------+---------+---------+
| 1 | A | 7-1-2020 | 77 | 49 | 68 |
+----+------+----------+---------+---------+---------+
| 1 | A | 7-2-2020 | 61 | 38 | 41 |
+----+------+----------+---------+---------+---------+
| 2 | B | 7-1-2020 | 41 | 36 | 54 |
+----+------+----------+---------+---------+---------+
+------------------+-----------+-----------+-----------+
| LocationType | Class A | Class B | Class C |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) | 77 | 49 | 68 |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) | 61 | 38 | 41 |
+------------------+-----------+-----------+-----------+
| (2, B, 7-2-2020) | 41 | 36 | 54 |
+------------------+-----------+-----------+-----------+
我写了下面的代码
# setting the index
df_col.set_index(['SQ','City','Date'], inplace=True)
df_row.set_index('LocationNo', inplace=True)
# I tried to merge vertically columns to columns. Hence, transpose the df_col.T
df_final = df_col.T.merge(df_row, left_index=True, right_index=True, how='left').groupby('LocationType').agg('sum').T
上述代码输出的结果如下所示
+----+------+----------+---------+---------+---------+
| SQ | City | Date | Class A | Class B | Class C |
+----+------+----------+---------+---------+---------+
| 1 | A | 7-1-2020 | 77 | 49 | 68 |
+----+------+----------+---------+---------+---------+
| 1 | A | 7-2-2020 | 61 | 38 | 41 |
+----+------+----------+---------+---------+---------+
| 2 | B | 7-1-2020 | 41 | 36 | 54 |
+----+------+----------+---------+---------+---------+
+------------------+-----------+-----------+-----------+
| LocationType | Class A | Class B | Class C |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) | 77 | 49 | 68 |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) | 61 | 38 | 41 |
+------------------+-----------+-----------+-----------+
| (2, B, 7-2-2020) | 41 | 36 | 54 |
+------------------+-----------+-----------+-----------+
答案是正确的。但是,前3列合并为一列。我需要将其分为单独的列,类似于上面提到的所需输出
我应该如何解决这个问题,以及有效的方法是什么?让我们过滤df\u col
中的loc
类似列,然后根据df\u row
中的LocationNo
将这些列映射到LocationType
,最后,groupby
使用sum
在这些映射列上沿着轴=1
和agg
的数据帧d
:
d = df_col.filter(like='Loc')
g = d.columns.map(df_row.set_index('LocationNo')['LocationType'])
out = df_col[['SQ','City','Date']].join(d.groupby(g, axis=1).sum())
单向使用melt
、merge
和groupby
:
print (df_col.melt(id_vars=["SQ", "City", "Date"], var_name="LocationNo")
.merge(df_row, how="left", on="LocationNo")
.groupby(["SQ", "City","LocationType", "Date"])["value"].sum()
.unstack("LocationType"))
LocationType Class A Class B Class C
SQ City Date
1 A 7-1-2020 77 49 68
7-2-2020 61 38 41
2 B 7-2-2020 41 36 54
接受并投票表决。只是想知道,如果数据量很大,map
会不会慢下来??@Tommy一点也不会;)。。这里我们只映射数据帧df\u col