Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从一个数据帧中减去另一个数据帧中的属性值_Python_Python 3.x_Pandas - Fatal编程技术网

Python 从一个数据帧中减去另一个数据帧中的属性值

Python 从一个数据帧中减去另一个数据帧中的属性值,python,python-3.x,pandas,Python,Python 3.x,Pandas,此问题包含3个独立的数据帧。 df1表示产品1,2,3的“总计”,包含“值1”、“值2” df2表示产品1,2,3的“Customer1”,包含“value1”、“value2” df3表示产品1,2,3的“Customer2”,包含“value1”、“value2” df2和df3本质上是df1的子集 我想创建另一个数据帧,从df1中减去df2和df3,并标记这个df4。我希望df4是“市场”栏中的“剩余客户” 这就是我到目前为止所做的 import pandas as pd d1 = {

此问题包含3个独立的数据帧。 df1表示产品1,2,3的“总计”,包含“值1”、“值2” df2表示产品1,2,3的“Customer1”,包含“value1”、“value2” df3表示产品1,2,3的“Customer2”,包含“value1”、“value2”

df2和df3本质上是df1的子集

我想创建另一个数据帧,从df1中减去df2和df3,并标记这个df4。我希望df4是“市场”栏中的“剩余客户”

这就是我到目前为止所做的

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3
这将产生以下结果

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10
为了创建df4,我尝试了以下代码并得到了一个错误“TypeError:不支持的操作数类型-:“str”和“str”有人能帮忙吗

df4 = df1-(df2+df3)

print(df4)
这里有一个方法:

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9
解释

  • df1[cols].subtract(df2[cols].add(df3[cols])
    仅对指定列执行计算
  • assign(**{'Market':'RemainingCustomers','Product code':[1,2,3]})
    添加结果数据框所需的额外列
  • 排序索引(axis=1)
    为所需输出重新排序列

删除
市场
,将
产品代码
设置为索引,并对产品代码执行索引对齐算法。之后,只需重置指数并在结果中插入
Market

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

这并不完全符合OP的要求,但在我看来,这可能是一种更好的数据管理方式

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

如果我们坚持要求的产量

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9


也许我们可以使用
选择类型

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9

工作完美。谢谢这确实有效,但你应该把答案分成多行:)
df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9
(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9