Python 如何根据其他列中的项目对一列中的值求和？_Python_Pandas

Python 如何根据其他列中的项目对一列中的值求和？

python pandas

Python 如何根据其他列中的项目对一列中的值求和？,python,pandas,Python,Pandas,我有以下数据帧： Course Orders Ingredient 1 Ingredient 2 Ingredient 3 starter 3 Fish Bread Mayonnaise starter 1 Olives Bread starter 5 Hummus Pita main 1 Pizza main

我有以下数据帧：

    Course  Orders Ingredient 1 Ingredient 2  Ingredient 3
    starter 3      Fish         Bread         Mayonnaise
    starter 1      Olives       Bread   
    starter 5      Hummus       Pita    
    main    1      Pizza        
    main    6      Beef         Potato        Peas
    main    9      Fish         Peas    
    main    11     Bread        Mayonnaise    Beef
    main    4      Pasta        Bolognese     Peas
    desert  10     Cheese       Olives        Crackers
    desert  7      Cookies      Cream   
    desert  8      Cheesecake   Cream

我想计算每道菜每种配料的订单数量。成分在哪一列并不重要

以下是我希望输出的数据帧：

Course  Ord Ing1       IngOrd1 Ing2     IngOrd2 Ing3 IngOrd3
starter 3   Fish       3       Bread    4       Mayo     3
starter 1   Olives     1       Bread    4       
starter 5   Hummus     5       Pita     5       
main    1   Pizza      1                
main    6   Beef       17      Potato   6       Peas     21
main    9   Fish       9       Peas     21      
main    11  Bread      11      Mayo     11      Beef     17
main    4   Pasta      4       Bolognese 4      Peas     21
desert  10  Cheese     10      Olives   10      Crackers 10
desert  7   Cookies    7       Cream    15      
desert  8   Cheesecake 8       Cream    15

我尝试过使用groupby（）.sum（），但这不适用于3列中的成分

我也不能使用查找，因为在完整的数据框架中，我不知道我在寻找什么成分。

我不相信groupby或其他类似的pandas方法真的有灵巧的方法，尽管我很高兴被证明是错误的。在任何情况下，下面的内容都不是特别漂亮，但它会给你你想要的

import pandas as pd
from collections import defaultdict

# The data you provided
df = pd.read_csv('orders.csv')

# Group these labels for convenience
ingredients = ['Ingredient 1', 'Ingredient 2', 'Ingredient 3']
orders = ['IngOrd1', 'IngOrd2', 'IngOrd3']

# Interleave the two lists for final data frame
combined = [y for x in zip(ingredients, orders) for y in x]

# Restructure the data frame so we can group on ingredients
melted = pd.melt(df, id_vars=['Course', 'Orders'], value_vars=ingredients, value_name='Ingredient')

# This is a map that we can apply to each ingredient column to
# look up the correct order count
maps = defaultdict(lambda: defaultdict(int))

# Build the map. Every course/ingredient pair is keyed to the total
# count for that pair, e.g. {(main, beef): 17, ...}
for index, group in melted.groupby(['Course', 'Ingredient']):
    course, ingredient = index
    maps[course][ingredient] += group.Orders.sum()

# Now apply the map to each ingredient column of the data frame
# to create the new count columns
for i, o in zip(ingredients, orders):
    df[o] = df.apply(lambda x: maps[x.Course][x[i]], axis=1)

# Adjust the columns labels
df = df[['Course', 'Orders'] + combined]

print df

     Course  Orders Ingredient 1  IngOrd1 Ingredient 2  IngOrd2 Ingredient 3  IngOrd3
0   starter       3         Fish        3        Bread        4   Mayonnaise        3
1   starter       1       Olives        1        Bread        4          NaN        0
2   starter       5       Hummus        5         Pita        5          NaN        0
3      main       1        Pizza        1          NaN        0          NaN        0
4      main       6         Beef       17       Potato        6         Peas       19
5      main       9         Fish        9         Peas       19          NaN        0
6      main      11        Bread       11   Mayonnaise       11         Beef       17
7      main       4        Pasta        4    Bolognese        4         Peas       19
8    desert      10       Cheese       10       Olives       10     Crackers       10
9    desert       7      Cookies        7        Cream       15          NaN        0
10   desert       8   Cheesecake        8        Cream       15          NaN        0

如果这是一个问题，您需要处理NAN和0计数。但这是一项微不足道的任务

您能解释一下您是如何计算第1、2、3列的吗？@trolster这些列是需要特定成分的订单的总和。例如，在IngCredit1列中，“Beef”有六个订单，在IngCredit2中Beef有0个订单，在IngCredit3中Beef有11个订单：因此IngOrd1和IngOrd3的牛肉值为17。我理解他是如何总结的，我只是不理解为什么Hamish希望数据框以这种方式显示和组织。我认为有更好的方法可以做到这一点。@Hamish Pegg提供一个我们可以在本地运行和使用的工作示例会很有帮助。一般来说，这是一个很好的做法。@tnknapp按照这个逻辑，奥利弗必须是11，正好在英戈尔的下面？豌豆和博客是如何计算的？@trolster我承认失败！你说得对。我只考虑了dataframe中的几个项目。我仍然不明白为什么哈米什想要以如此复杂的方式表示数据。唉，我怀疑我们会不会收到他的回音。