Python 从一个数据帧中减去另一个数据帧中的属性值_Python_Python 3.x_Pandas

Python 从一个数据帧中减去另一个数据帧中的属性值

python python-3.x pandas

Python 从一个数据帧中减去另一个数据帧中的属性值,python,python-3.x,pandas,Python,Python 3.x,Pandas,此问题包含3个独立的数据帧。 df1表示产品1,2,3的“总计”，包含“值1”、“值2” df2表示产品1,2,3的“Customer1”，包含“value1”、“value2” df3表示产品1,2,3的“Customer2”，包含“value1”、“value2” df2和df3本质上是df1的子集我想创建另一个数据帧，从df1中减去df2和df3，并标记这个df4。我希望df4是“市场”栏中的“剩余客户” 这就是我到目前为止所做的 import pandas as pd d1 = {

此问题包含3个独立的数据帧。 df1表示产品1,2,3的“总计”，包含“值1”、“值2” df2表示产品1,2,3的“Customer1”，包含“value1”、“value2” df3表示产品1,2,3的“Customer2”，包含“value1”、“value2”

df2和df3本质上是df1的子集

我想创建另一个数据帧，从df1中减去df2和df3，并标记这个df4。我希望df4是“市场”栏中的“剩余客户”

这就是我到目前为止所做的

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3

这将产生以下结果

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10

为了创建df4，我尝试了以下代码并得到了一个错误“TypeError:不支持的操作数类型-：“str”和“str”有人能帮忙吗

df4 = df1-(df2+df3)

print(df4)

这里有一个方法：

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9

解释

df1[cols].subtract（df2[cols].add（df3[cols]）

仅对指定列执行计算

assign（**{'Market'：'RemainingCustomers'，'Product code'：[1,2,3]}）

添加结果数据框所需的额外列

```
排序索引（axis=1）
```
为所需输出重新排序列

删除

市场

，将

产品代码

设置为索引，并对产品代码执行索引对齐算法。之后，只需重置指数并在结果中插入

Market

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

这并不完全符合OP的要求，但在我看来，这可能是一种更好的数据管理方式

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

及

如果我们坚持要求的产量

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9

或

也许我们可以使用

选择类型

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9

工作完美。谢谢这确实有效，但你应该把答案分成多行：）

df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9