Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在Pandas中高效地合并列和行两个数据帧_Python_Pandas_Dataframe - Fatal编程技术网

Python 在Pandas中高效地合并列和行两个数据帧

Python 在Pandas中高效地合并列和行两个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个详细的事务数据数据框,如下所示 df_col = pd.DataFrame({'SQ':[1,1,2], 'City':['A','A','B'], 'Date':['7-1-2020','7-2-2020','7-1-2020'], 'Loc 1':[40,21,27], 'Loc 2':[37,40,14], 'L

我有一个详细的
事务数据
数据框,如下所示

 df_col = pd.DataFrame({'SQ':[1,1,2],
                 'City':['A','A','B'],
                 'Date':['7-1-2020','7-2-2020','7-1-2020'],
                 'Loc 1':[40,21,27],
                 'Loc 2':[37,40,14],
                 'Loc 3':[49,38,36],
                 'Loc 4':[20,14,18],
                 'Loc 5':[48,27,36]})


+----+------+----------+-------+-------+-------+-------+-------+
| SQ | City |   Date   | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-1-2020 |   40  |   37  |   49  |   20  |   48  |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-2-2020 |   21  |   40  |   38  |   14  |   27  |
+----+------+----------+-------+-------+-------+-------+-------+
|  2 |   B  | 7-1-2020 |   27  |   14  |   36  |   18  |   36  |
+----+------+----------+-------+-------+-------+-------+-------+
+------------+--------------+
| LocationNo | LocationType |
+------------+--------------+
|    Loc 1   |    Class A   |
+------------+--------------+
|    Loc 2   |    Class A   |
+------------+--------------+
|    Loc 3   |    Class B   |
+------------+--------------+
|    Loc 4   |    Class C   |
+------------+--------------+
|    Loc 5   |    Class C   |
+------------+--------------+

df_row = pd.DataFrame({'LocationNo':['Loc 1','Loc 2','Loc 3','Loc 4','Loc 5'],
             'LocationType':['Class A', 'Class A', 'Class B', 'Class C', 'Class C']
             })
另外,我还有一个单独的
位置数据
数据框,如下所示

 df_col = pd.DataFrame({'SQ':[1,1,2],
                 'City':['A','A','B'],
                 'Date':['7-1-2020','7-2-2020','7-1-2020'],
                 'Loc 1':[40,21,27],
                 'Loc 2':[37,40,14],
                 'Loc 3':[49,38,36],
                 'Loc 4':[20,14,18],
                 'Loc 5':[48,27,36]})


+----+------+----------+-------+-------+-------+-------+-------+
| SQ | City |   Date   | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-1-2020 |   40  |   37  |   49  |   20  |   48  |
+----+------+----------+-------+-------+-------+-------+-------+
|  1 |   A  | 7-2-2020 |   21  |   40  |   38  |   14  |   27  |
+----+------+----------+-------+-------+-------+-------+-------+
|  2 |   B  | 7-1-2020 |   27  |   14  |   36  |   18  |   36  |
+----+------+----------+-------+-------+-------+-------+-------+
+------------+--------------+
| LocationNo | LocationType |
+------------+--------------+
|    Loc 1   |    Class A   |
+------------+--------------+
|    Loc 2   |    Class A   |
+------------+--------------+
|    Loc 3   |    Class B   |
+------------+--------------+
|    Loc 4   |    Class C   |
+------------+--------------+
|    Loc 5   |    Class C   |
+------------+--------------+

df_row = pd.DataFrame({'LocationNo':['Loc 1','Loc 2','Loc 3','Loc 4','Loc 5'],
             'LocationType':['Class A', 'Class A', 'Class B', 'Class C', 'Class C']
             })
现在,我的任务是将
df_col
列合并到
df_row
中的行中,并求和这些值。 合并
垂直到水平
列到行

我想要的输出如下

+----+------+----------+---------+---------+---------+
| SQ | City |   Date   | Class A | Class B | Class C |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-1-2020 |    77   |    49   |    68   |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-2-2020 |    61   |    38   |    41   |
+----+------+----------+---------+---------+---------+
|  2 |   B  | 7-1-2020 |    41   |    36   |    54   |
+----+------+----------+---------+---------+---------+
+------------------+-----------+-----------+-----------+
|   LocationType   | Class   A | Class   B | Class   C |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     77    |     49    |     68    |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     61    |     38    |     41    |
+------------------+-----------+-----------+-----------+
| (2, B, 7-2-2020) |     41    |     36    |     54    |
+------------------+-----------+-----------+-----------+
我写了下面的代码

# setting the index
df_col.set_index(['SQ','City','Date'], inplace=True)
df_row.set_index('LocationNo', inplace=True)


 # I tried to merge vertically columns to columns. Hence, transpose the df_col.T
df_final = df_col.T.merge(df_row, left_index=True, right_index=True, how='left').groupby('LocationType').agg('sum').T
上述代码输出的结果如下所示

+----+------+----------+---------+---------+---------+
| SQ | City |   Date   | Class A | Class B | Class C |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-1-2020 |    77   |    49   |    68   |
+----+------+----------+---------+---------+---------+
|  1 |   A  | 7-2-2020 |    61   |    38   |    41   |
+----+------+----------+---------+---------+---------+
|  2 |   B  | 7-1-2020 |    41   |    36   |    54   |
+----+------+----------+---------+---------+---------+
+------------------+-----------+-----------+-----------+
|   LocationType   | Class   A | Class   B | Class   C |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     77    |     49    |     68    |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) |     61    |     38    |     41    |
+------------------+-----------+-----------+-----------+
| (2, B, 7-2-2020) |     41    |     36    |     54    |
+------------------+-----------+-----------+-----------+
答案是正确的。但是,前3列合并为一列。我需要将其分为单独的列,类似于上面提到的
所需输出


我应该如何解决这个问题,以及有效的方法是什么?

让我们
过滤
df\u col
中的
loc
类似列,然后根据
df\u row
中的
LocationNo
将这些列映射到
LocationType
,最后,
groupby
使用
sum
在这些映射列上沿着
轴=1
agg
的数据帧
d

d = df_col.filter(like='Loc')
g = d.columns.map(df_row.set_index('LocationNo')['LocationType'])
out = df_col[['SQ','City','Date']].join(d.groupby(g, axis=1).sum())


单向使用
melt
merge
groupby

print (df_col.melt(id_vars=["SQ", "City", "Date"], var_name="LocationNo")
             .merge(df_row, how="left", on="LocationNo")
             .groupby(["SQ", "City","LocationType", "Date"])["value"].sum()
             .unstack("LocationType"))

LocationType      Class A  Class B  Class C
SQ City Date                               
1  A    7-1-2020       77       49       68
        7-2-2020       61       38       41
2  B    7-2-2020       41       36       54

接受并投票表决。只是想知道,如果数据量很大,
map
会不会慢下来??@Tommy一点也不会;)。。这里我们只映射数据帧
df\u col