Python 我无法根据3个条件为df行正确赋值（检查其他3列中的值）_Python_Pandas_Dataframe

Python 我无法根据3个条件为df行正确赋值（检查其他3列中的值）

python pandas dataframe

Python 我无法根据3个条件为df行正确赋值（检查其他3列中的值）,python,pandas,dataframe,Python,Pandas,Dataframe,我试图为df中特定行中的列指定一个比例值。每一行表示特定月份内一种独特产品的销售额，数据框（称为testingAgain）如下所示： Month ProductID(SKU) Family Sales ProporcionVenta 1 1234 FISH 10000.0 0.0 for family in uniqueFamilies:

我试图为df中特定行中的列指定一个比例值。每一行表示特定月份内一种独特产品的销售额，数据框（称为testingAgain）如下所示：

  Month       ProductID(SKU)        Family       Sales       ProporcionVenta
    1         1234                FISH      10000.0              0.0

for family in uniqueFamilies:
      for month in months:
        salesFamilyMonth = testingAgain[(testingAgain['Family']==family)&(testingAgain['Month']==month)]['Qty'].sum()
        for sku in uniqueSKU:
          salesSKUMonth = testingAgain[(testingAgain['Family']==family)&(testingAgain['Month']==month)&(testingAgain['SKU']==sku)]['Qty'].sum()
          proporcion = salesSKUMonth/salesFamilyMonth
    
          testingAgain[(testingAgain['SKU']==sku)&(testingAgain['Family']==familia)&(testingAgain['Month']==month)]['ProporcionVenta'] = proporcion

此行表示1234产品在一月份的销售额。（它是一个集合，因此在DB中代表每年1月）

现在，我试图找出独特的productid月份的销售额占家庭月份销售额总和的比例。例如，family fish在第1个月的销售额为100000，因此在这种特定情况下，它将被计算为10000/100000（productid月销售额/family月销售额）

我正试图这样做：

  Month       ProductID(SKU)        Family       Sales       ProporcionVenta
    1         1234                FISH      10000.0              0.0

for family in uniqueFamilies:
      for month in months:
        salesFamilyMonth = testingAgain[(testingAgain['Family']==family)&(testingAgain['Month']==month)]['Qty'].sum()
        for sku in uniqueSKU:
          salesSKUMonth = testingAgain[(testingAgain['Family']==family)&(testingAgain['Month']==month)&(testingAgain['SKU']==sku)]['Qty'].sum()
          proporcion = salesSKUMonth/salesFamilyMonth
    
          testingAgain[(testingAgain['SKU']==sku)&(testingAgain['Family']==familia)&(testingAgain['Month']==month)]['ProporcionVenta'] = proporcion

代码运行正常，我甚至单独打印了比例并在Excel中计算了它们，它们是正确的，但问题在于最后一行。代码运行完成后，我打印testingAgain，并看到所有比例都列为0.0，即使它们本应被分配新的比例

我不完全相信我的方法，但我认为它是体面的

有什么办法解决这个问题吗

谢谢，非常感谢。

通常，在Pandas（甚至是Numpy）中，与通用Python不同，分析师应该避免使用

进行循环，因为有许多矢量化选项可以运行条件计算或分组计算。在您的情况下，考虑哪些返回内联聚合（即，不折叠行的聚合值）或
如文档所示：广播以匹配输入数组的形状
当前，您的代码正在尝试为数据帧列的子集片段分配一个值，该值应引发setingwithcopywarning
。这样的操作不会影响原始数据帧。循环可以使用.loc
进行条件赋值
testingAgain.loc[(testingAgain['SKU']==sku) &
                 (testingAgain['Family']==familia) &
                 (testingAgain['Month']==month), 'ProporcionVenta'] = proporcion

但是，避免循环，因为transform
可以很好地分配新的数据帧列。另外，下面的div
是方法（功能等同于/
运算符）
非常感谢你。您的第二种方法（使用transform和div）非常干净，工作非常完美。非常感谢。