Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于多列分组的数据帧分割_Python_Pandas_Dataframe - Fatal编程技术网

Python 基于多列分组的数据帧分割

Python 基于多列分组的数据帧分割,python,pandas,dataframe,Python,Pandas,Dataframe,我有这种数据帧,我会在多个列中分割成多个具有唯一值的数据帧。 DF: 我可以基于一列代码来完成这项工作,代码是df_list=[d for,d在df.groupby(['a'])]] 我能够通过以下方式完成我想要的操作: for df in df_list: df["e"] = df.apply(lambda x: df.loc[x.name+1:,"c"].mean(),axis=1) 输出 df_list [ a b

我有这种数据帧,我会在多个列中分割成多个具有唯一值的数据帧。 DF:

我可以基于一列代码来完成这项工作,代码是
df_list=[d for,d在df.groupby(['a'])]]
我能够通过以下方式完成我想要的操作:

for df in df_list:
    df["e"] = df.apply(lambda x: df.loc[x.name+1:,"c"].mean(),axis=1)
输出


df_list
[       a      b  c  d    e
 2  black   grey  0  0  2.0
 5  black  brown  2  8  NaN,
        a      b  c  d    e
 1  brown    red  4  5  NaN,
        a      b  c  d    e
 4  green   blue  0  3  NaN,
       a       b  c  d    e
 0    red  green  1  2  5.0
 3    red   blue  6  1  4.0
 6    red   grey  4  6  NaN]
但是如何处理多个列呢

“红色”值的预期结果:


您可以提取
a
b
列的唯一值,并将每个列用作筛选器。比如说,

import pandas as pd

df = pd.DataFrame(
    [
        ["red", "green", 1, 2],
        ["brown", "red", 4, 5],
        ["black", "grey", 0, 0],
        ["red", "blue", 6, 1],
        ["green", "blue", 0, 3],
        ["black", "brown", 2, 8],
        ["red", "grey", 4, 6],
    ],
    columns=["a", "b", "c", "d"]
)

colors = pd.unique(df[['a', 'b']].values.ravel('K'))

>>> colors
    array(['red', 'brown', 'black', 'green', 'grey', 'blue'], dtype=object)
迭代每种颜色,并在过滤后对生成的
当前_df
执行操作

df_list = []
for color in colors:
    current_df = df[(df.a == color) | (df.b == color)].copy().reset_index(drop=True)
    current_df["e"] = current_df.apply(
        lambda x: (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
        )
        / (current_df.shape[0] - x.name - 1),
        axis=1
    )
    df_list.append(current_df)
(current_-df.shape[0]-x.name-1)
成为添加的值的数目,因为
x.name
是行号,
current_-df.shape[0]
是当前过滤的
df
的总行数。这相当于:

df_list = []
for color in colors:
    current_df = df[(df.a == color) | (df.b == color)].copy()
    current_df["e"] = current_df.apply(
        lambda x: (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
        )
        / (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].size
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].size
        ),
        axis=1,
    )
    df_list.append(current_df)
红色的结果:

>>> df_list[0]
           a      b  c  d    e
    0    red  green  1  2  5.0
    1  brown    red  4  5  5.0
    3    red   blue  6  1  4.0
    6    red   grey  4  6  NaN

伟大的但有一个问题,它计算列“c”中的值,但列“d”中有一个“red”值,因此第一行的结果应该是5+4+6,而不是4+6+4。你是对的。让我更正我的答案。现在检查一下,@charlesalakissgreat!非常感谢你!
df_list = []
for color in colors:
    current_df = df[(df.a == color) | (df.b == color)].copy()
    current_df["e"] = current_df.apply(
        lambda x: (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
        )
        / (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].size
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].size
        ),
        axis=1,
    )
    df_list.append(current_df)
>>> df_list[0]
           a      b  c  d    e
    0    red  green  1  2  5.0
    1  brown    red  4  5  5.0
    3    red   blue  6  1  4.0
    6    red   grey  4  6  NaN