Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫中多个数据帧的复杂拆分、合并和透视_Python_Pandas_Dataframe_Merge - Fatal编程技术网

Python 熊猫中多个数据帧的复杂拆分、合并和透视

Python 熊猫中多个数据帧的复杂拆分、合并和透视,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,我有两个熊猫数据框,它们必须是合并和枢轴。在其中一个数据帧中,列是由字符串和逗号分隔的。数据帧是 import pandas as pd import numpy as np tableA = [(100, 'chocolate, sprinkles'), (101, 'chocolate, sprinkles'), (102, 'glazed')] labels = ['product', 'tags'] dfA = pd.DataFrame.from_records(t

我有两个熊猫数据框,它们必须是合并和枢轴。在其中一个数据帧中,列是由字符串和逗号分隔的。数据帧是

import pandas as pd
import numpy as np

tableA = [(100, 'chocolate, sprinkles'),
     (101, 'chocolate, sprinkles'),
     (102, 'glazed')]
labels = ['product', 'tags']
dfA = pd.DataFrame.from_records(tableA, columns=labels)

tableB = [('A', 100),
       ('A', 101),
       ('B', 101),
       ('C', 100),
       ('C', 102),
       ('B', 101),
       ('A', 100),
       ('C', 102)]
labels = ['customer', 'product']
dfB = pd.DataFrame.from_records(tableB, columns=labels) 

dfA:
     product                  tags
 0      100  chocolate, sprinkles
 1      101  chocolate, sprinkles
 2      102                glazed
dfB:
   customer  product
 0        A      100
 1        A      101
 2        B      101
 3        C      100
 4        C      102
 5        B      101
 6        A      100
 7        C      102
结果一定是这样的

 customer   sprinkles   chocolate   glazed
 A          ?            ?              ?
 B          ?            ?              ?   
 C          ?            ?              ?   
我尝试过各种功能,但都失败了。任何建议都将不胜感激

我的一些代码,我知道这不起作用,但它应该让你了解我试图做什么:

dfC=dfB.merge(dfA, left_on='product', right_on='product')
print(dfC)
这导致了

        customer  product                  tags
 0        A      100  chocolate, sprinkles
 1        C      100  chocolate, sprinkles
 2        A      100  chocolate, sprinkles
 3        A      101  chocolate, sprinkles
 4        B      101  chocolate, sprinkles
 5        B      101  chocolate, sprinkles
 6        C      102                glazed
 7        C      102                glazed
以及

这导致:

     var1        var2
0     A   chocolate
1     A   sprinkles
2     C   chocolate
3     C   sprinkles
4     A   chocolate
5     A   sprinkles
6     A   chocolate
7     A   sprinkles
8     B   chocolate
9     B   sprinkles
10    B   chocolate
11    B   sprinkles
12    C      glazed
13    C      glazed

首先,您需要剥离var2:

dfS['var2'] = dfS['var2'].str.strip()
若要删除空间,则可以为每个标记创建一列,例如:

dfS['chocolate'] = dfS['var2'].apply(lambda x: 1 if x == 'chocolate' else 0)
dfS['sprinkles'] = dfS['var2'].apply(lambda x: 1 if x == 'sprinkles' else 0)
dfS['glazed'] = dfS['var2'].apply(lambda x: 1 if x == 'glazed' else 0)
然后您可以
groupby
var1并将其他列聚合为总和,例如:

dfS.groupby('var1').agg(sum).reset_index().rename(columns ={'var1':'customer'})
输出如下所示:

  customer  chocolate  sprinkles  glazed
0        A          3          3       0
1        B          2          2       0
2        C          1          1       2

使用联合数据帧
dfs
可以使用
pd.crosstab
获取客户标签使用计数

pd.crosstab(dfs.var1,dfs.var2)

var2  chocolate  glazed  sprinkles
var1
A             3       0          3
B             2       0          2
C             1       2          1

谢谢但正如您所看到的,虽然巧克力和洒布的数量在原始数据帧中相同,但结果中只有一个有价值。当我直接使用table_pivot时,它发生在我身上。对此有什么建议吗?对不起,我添加了strip()来删除var2列中的空间(隐藏空间对我来说总是很棘手)
pd.crosstab(dfs.var1,dfs.var2)

var2  chocolate  glazed  sprinkles
var1
A             3       0          3
B             2       0          2
C             1       2          1