Python 熊猫将新行计算为存在相同列值的行之间的差异_Python_Pandas

Python 熊猫将新行计算为存在相同列值的行之间的差异

python pandas

Python 熊猫将新行计算为存在相同列值的行之间的差异,python,pandas,Python,Pandas,作为一个简化的示例，假设我有一个数据帧，如下所示： Group Type Value1 Value2 Red A 13 24 Red B 3 12 Blue C 5 0 Red C 8 9 Green A 2 -1 Red None 56 78 Blue A 40 104 Green

作为一个简化的示例，假设我有一个数据帧，如下所示：

Group   Type   Value1   Value2
Red     A      13       24
Red     B      3        12
Blue    C      5        0
Red     C      8        9
Green   A      2        -1
Red     None   56       78
Blue    A      40       104
Green   B      1        -5

我要计算的是，对于每个组条目，类型A和B的行之间的值1的差值，以及类型A和B的行之间的值2的差值

由于红色和绿色是唯一具有类型A和B的条目的组，因此我们将只计算这些组的新行。因此，生成的数据帧将是：

Group   Type   Value1   Value2
Red     A-B    10       12
Green   A-B    1        4

我最初的想法只是用

df=df[df['Type'].isin（['A'，'B']）]

筛选类型为'A'或'B'的行，然后再次筛选类型为'A'和'B'的行中的组（不确定如何做），然后排序并应用diff（）

因此下面的代码将为每种类型创建组，然后从每个数据帧中减去每个数据帧，从而生成具有减去值的最终数据帧。输入数据帧作为inp_df，您想要的数据帧将是final_df：

grouped = inp_df.groupby('Type')

# Getting the list of groups:
list_o_groups = list(grouped.groups.keys())

# Going through each group and subtracting the one from the other:
sub_df_dict = {}
for first_idx, first_df in enumerate(list_o_groups):
    for second_idx, second_df in enumerate(list_o_groups):
        if second_idx <= first_idx:
            continue
        sub_df_dict['%s-%s' % (first_df, second_df)] = pd.DataFrame()
        sub_df_dict['%s-%s' % (first_df, second_df)]['Value1'] = grouped.get_group(first_df)['Value1'] - grouped.get_group(second_df)['Value1']
        sub_df_dict['%s-%s' % (first_df, second_df)]['Value2'] = grouped.get_group(first_df)['Value2'] - grouped.get_group(second_df)['Value2']
        sub_df_dict['%s-%s' % (first_df, second_df)]['Type'] = ['%s-%s' % (first_df, second_df)] * sub_df_dict['%s-%s' % (first_df, second_df)].shape[0]

# Combining them into one df:
for idx, each_key in enumerate(sub_df_dict.keys()):
    if idx == 0:
        final_df = sub_df_dict[each_key]
        continue
    else:
        final_df = final_df.append(sub_df_dict[each_key])

# Cleaning up the dataframe
final_df.dropna(inplace=True)

grouped=inp_df.groupby（'Type'））
#获取组列表：
list\u o\u groups=list（grouped.groups.keys（））
#检查每组并从另一组中减去一组：
sub_df_dict={}
对于枚举中的第一个idx、第一个df（列出组）：
对于枚举中的第二个\u idx、第二个\u df（列出组）：
如果第二个_idx导入熊猫作为pd
从io导入StringIO
#使用字符串io读取数据
数据=StringIO（““”组，类型，值1，值2
红色，A，13，24
红色，B，3，12
蓝色，C，5，0
红色，C，8，9
绿色，A，2，-1
红色，无，56,78
蓝色，A，40104
绿色，B，1，-5“
df=pd.read\U csv（数据）
#创建tidyr排列式操作
def排列（df、propcol、valcol）：
indcol=list（df.columns.drop（valcol））
df=df.set_index（indcol）.unstack（propcol）.reset_index（）
df.columns=[i[1]如果i[0]==valcol else i[0]表示df.columns中的i]
返回df
df=排列（df，‘组’、‘类型’）
#创建筛选器条件以删除“C”。也可以做相反的事情
notBlueC=df['Blue']！='C'
notGreenC=df['Green']！='C'
notRedC=df['Red']！='C'
清洁_df=df[notBlueC¬GreenC¬RedC]
你试过了吗？我一直想弄清楚的是，如何筛选出既不显示“A”类型也不显示“B”类型的组。例如，我可以过滤掉类型不在“A”或“B”中的三行。但是我还想过滤掉Group='Blue'和Type='A'所在的行，因为没有与Type='B'对应的行，而这正是我被卡住的地方。我只想使用spread（）
创建类型的分类列，以便每个观察都有一个单独的列，就像tidyr函数spread（）