Python 如何根据其他列中的项目对一列中的值求和?
我有以下数据帧:Python 如何根据其他列中的项目对一列中的值求和?,python,pandas,Python,Pandas,我有以下数据帧: Course Orders Ingredient 1 Ingredient 2 Ingredient 3 starter 3 Fish Bread Mayonnaise starter 1 Olives Bread starter 5 Hummus Pita main 1 Pizza main
Course Orders Ingredient 1 Ingredient 2 Ingredient 3
starter 3 Fish Bread Mayonnaise
starter 1 Olives Bread
starter 5 Hummus Pita
main 1 Pizza
main 6 Beef Potato Peas
main 9 Fish Peas
main 11 Bread Mayonnaise Beef
main 4 Pasta Bolognese Peas
desert 10 Cheese Olives Crackers
desert 7 Cookies Cream
desert 8 Cheesecake Cream
我想计算每道菜每种配料的订单数量。成分在哪一列并不重要
以下是我希望输出的数据帧:
Course Ord Ing1 IngOrd1 Ing2 IngOrd2 Ing3 IngOrd3
starter 3 Fish 3 Bread 4 Mayo 3
starter 1 Olives 1 Bread 4
starter 5 Hummus 5 Pita 5
main 1 Pizza 1
main 6 Beef 17 Potato 6 Peas 21
main 9 Fish 9 Peas 21
main 11 Bread 11 Mayo 11 Beef 17
main 4 Pasta 4 Bolognese 4 Peas 21
desert 10 Cheese 10 Olives 10 Crackers 10
desert 7 Cookies 7 Cream 15
desert 8 Cheesecake 8 Cream 15
我尝试过使用groupby().sum(),但这不适用于3列中的成分
我也不能使用查找,因为在完整的数据框架中,我不知道我在寻找什么成分。我不相信groupby或其他类似的pandas方法真的有灵巧的方法,尽管我很高兴被证明是错误的。在任何情况下,下面的内容都不是特别漂亮,但它会给你你想要的
import pandas as pd
from collections import defaultdict
# The data you provided
df = pd.read_csv('orders.csv')
# Group these labels for convenience
ingredients = ['Ingredient 1', 'Ingredient 2', 'Ingredient 3']
orders = ['IngOrd1', 'IngOrd2', 'IngOrd3']
# Interleave the two lists for final data frame
combined = [y for x in zip(ingredients, orders) for y in x]
# Restructure the data frame so we can group on ingredients
melted = pd.melt(df, id_vars=['Course', 'Orders'], value_vars=ingredients, value_name='Ingredient')
# This is a map that we can apply to each ingredient column to
# look up the correct order count
maps = defaultdict(lambda: defaultdict(int))
# Build the map. Every course/ingredient pair is keyed to the total
# count for that pair, e.g. {(main, beef): 17, ...}
for index, group in melted.groupby(['Course', 'Ingredient']):
course, ingredient = index
maps[course][ingredient] += group.Orders.sum()
# Now apply the map to each ingredient column of the data frame
# to create the new count columns
for i, o in zip(ingredients, orders):
df[o] = df.apply(lambda x: maps[x.Course][x[i]], axis=1)
# Adjust the columns labels
df = df[['Course', 'Orders'] + combined]
print df
Course Orders Ingredient 1 IngOrd1 Ingredient 2 IngOrd2 Ingredient 3 IngOrd3
0 starter 3 Fish 3 Bread 4 Mayonnaise 3
1 starter 1 Olives 1 Bread 4 NaN 0
2 starter 5 Hummus 5 Pita 5 NaN 0
3 main 1 Pizza 1 NaN 0 NaN 0
4 main 6 Beef 17 Potato 6 Peas 19
5 main 9 Fish 9 Peas 19 NaN 0
6 main 11 Bread 11 Mayonnaise 11 Beef 17
7 main 4 Pasta 4 Bolognese 4 Peas 19
8 desert 10 Cheese 10 Olives 10 Crackers 10
9 desert 7 Cookies 7 Cream 15 NaN 0
10 desert 8 Cheesecake 8 Cream 15 NaN 0
如果这是一个问题,您需要处理NAN和0计数。但这是一项微不足道的任务 您能解释一下您是如何计算第1、2、3列的吗?@trolster这些列是需要特定成分的订单的总和。例如,在IngCredit1列中,“Beef”有六个订单,在IngCredit2中Beef有0个订单,在IngCredit3中Beef有11个订单:因此IngOrd1和IngOrd3的牛肉值为17。我理解他是如何总结的,我只是不理解为什么Hamish希望数据框以这种方式显示和组织。我认为有更好的方法可以做到这一点。@Hamish Pegg提供一个我们可以在本地运行和使用的工作示例会很有帮助。一般来说,这是一个很好的做法。@tnknapp按照这个逻辑,奥利弗必须是11,正好在英戈尔的下面?豌豆和博客是如何计算的?@trolster我承认失败!你说得对。我只考虑了dataframe中的几个项目。我仍然不明白为什么哈米什想要以如此复杂的方式表示数据。唉,我怀疑我们会不会收到他的回音。