Python 3.x 不使用循环的数据帧操作_Python 3.x_Pandas_Numpy

Python 3.x 不使用循环的数据帧操作

python-3.x pandas numpy

Python 3.x 不使用循环的数据帧操作,python-3.x,pandas,numpy,Python 3.x,Pandas,Numpy,请查找以下输入和输出。对应于每个店铺id和期间id，应该有11个项目，如果缺少任何项目，则添加该项目并用0填充该行不使用循环非常感谢您的帮助输入预期产出您可以执行以下操作：来自itertools导入产品的 pdindex=产品（df.groupby（[“商店id”，“期间id”））。组，范围（1,12）） pdindex=pd.MultiIndex.from_元组（map（lambda x:（*x[0]，x[1]），pdindex），name=[“存储id”、“期间id”、“项”]

请查找以下输入和输出。对应于每个店铺id和期间id，应该有11个项目，如果缺少任何项目，则添加该项目并用0填充该行 不使用循环

非常感谢您的帮助

输入

预期产出

您可以执行以下操作：

来自itertools导入产品的


pdindex=产品（df.groupby（[“商店id”，“期间id”））。组，范围（1,12））
pdindex=pd.MultiIndex.from_元组（map（lambda x:（*x[0]，x[1]），pdindex），name=[“存储id”、“期间id”、“项”]）
df=df.set_索引（[“存储id”、“期间id”、“项目”]）
res=pd.DataFrame（index=pdindex，columns=df.columns）
res.loc[df.index，df.columns]=df
res=res.fillna（0）.重置索引（）

现在，仅当您没有超出范围的

项目[1,11]
时，此功能才起作用。您可以执行以下操作：
来自itertools导入产品的
pdindex=产品（df.groupby（[“商店id”，“期间id”））。组，范围（1,12））
pdindex=pd.MultiIndex.from_元组（map（lambda x:（*x[0]，x[1]），pdindex），name=[“存储id”、“期间id”、“项”]）
df=df.set_索引（[“存储id”、“期间id”、“项目”]）
res=pd.DataFrame（index=pdindex，columns=df.columns）
res.loc[df.index，df.columns]=df
res=res.fillna（0）.重置索引（）

现在，仅当您没有超出范围的项目[1,11]
时，此功能才起作用。您可以执行以下操作：
样本df：
df = pd.DataFrame({'store_id':[1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962, 1160962],
                   'period_id':[1025,1025,1025,1025,1025,1025,1026,1026,1026,1026,1026],
                   'item_x':[1,4,5,6,7,8,1,2,5,6,7],
                  'z':[1,4,5,6,7,8,1,2,5,6,7]})

解决方案：
num = range(1,12)
def f(x):
    return x.reindex(num, fill_value=0)\
                   .assign(store_id=x['store_id'].mode()[0], period_id = x['period_id'].mode()[0])

df.set_index('item_x').groupby(['store_id','period_id'], group_keys=False).apply(f).reset_index()

您可以这样做：
样本df：
df = pd.DataFrame({'store_id':[1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962, 1160962],
                   'period_id':[1025,1025,1025,1025,1025,1025,1026,1026,1026,1026,1026],
                   'item_x':[1,4,5,6,7,8,1,2,5,6,7],
                  'z':[1,4,5,6,7,8,1,2,5,6,7]})

解决方案：
num = range(1,12)
def f(x):
    return x.reindex(num, fill_value=0)\
                   .assign(store_id=x['store_id'].mode()[0], period_id = x['period_id'].mode()[0])

df.set_index('item_x').groupby(['store_id','period_id'], group_keys=False).apply(f).reset_index()

这是@GrzegorzSkibinski正确答案的简化
这个答案不是修改原始数据帧。它使用较少的变量来存储中间数据结构，并使用列表理解来简化map的使用
我还使用了reindex（）
，而不是使用生成的索引创建一个新的数据帧并用原始数据填充它
import pandas as pd
import itertools

df.set_index(
    ["store_id", "period_id", "Item_x"]
).reindex(
    pd.MultiIndex.from_tuples([
        group + (item,)
        for group, item in itertools.product(
            df.groupby(["store_id", "period_id"]).groups, 
            range(1, 12),
        )],
        names=["store_id", "period_id", "Item_x"]
    ),
    fill_value=0,
).reset_index()

在测试中，输出与您所列出的预期相符。
这是@GrzegorzSkibinski正确答案的简化
这个答案不是修改原始数据帧。它使用较少的变量来存储中间数据结构，并使用列表理解来简化map的使用
我还使用了reindex（）
，而不是使用生成的索引创建一个新的数据帧并用原始数据填充它
import pandas as pd
import itertools

df.set_index(
    ["store_id", "period_id", "Item_x"]
).reindex(
    pd.MultiIndex.from_tuples([
        group + (item,)
        for group, item in itertools.product(
            df.groupby(["store_id", "period_id"]).groups, 
            range(1, 12),
        )],
        names=["store_id", "period_id", "Item_x"]
    ),
    fill_value=0,
).reset_index()

在测试中，输出与您所列出的内容相匹配。
这似乎不正确，第一个周期的item_x
值得到了尊重，但第二个周期的值没有得到尊重（从第一个周期复制）。在我看来，在groupby上使用reindex可能是问题的根源……这似乎不正确，第一个周期的项_x
值得到尊重，但第二个周期的值没有得到尊重（从第一个周期复制）。在我看来，在groupby上使用reindex可能是问题的根源。。。