Python中的组处理

Python中的组处理,python,pandas,Python,Pandas,我的问题围绕着在如下外观的数据框架中应用逻辑: ID yyyymm value1 value2 1 201501 0 123 1 201502 1 113 1 201503 3 115 2 201506 0 0 2 201507 0 0 2 201508 1

我的问题围绕着在如下外观的数据框架中应用逻辑:

ID yyyymm value1 value2 1 201501 0 123 1 201502 1 113 1 201503 3 115 2 201506 0 0 2 201507 0 0 2 201508 1 115 2 201509 0 0 3 201503 0 0 3 201504 0 0 3 201505 0 0
非常感谢您抽出时间

Hi您正在寻找的方法是
groupby

这样做只是

do
data.grouby(['ID','yyyyymm']).sum()

这将按
yyyymm
列对数据进行分组,并对每个组运行sum


干杯,

这将按日期排序,按ID分组,并找到value1或value2>0的行,将该行保存到另一个数据框中,然后移动到下一个ID组。如果希望每个组ID保存超过1次,只需删除中断

我不知道在最终数据帧的“time\u value1”或“time\u value2”列中需要什么值,但您可以轻松地将该变量赋值编辑为您想要的任何值

import pandas as pd

data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3,3],
                'yyyymm':[201501,201502,201503,201506,201507,201508,201509,201503,201504,201505],
                'value1':[0,1,3,0,0,1,0,0,0,0],
                'value2':[123,113,115,0,0,115,0,0,0,0]})

final = pd.DataFrame(columns=["ID", "time_value1", "value1", "time_value2", "value2"])

def findTimes(df):
    for index, row in df.iterrows():
        if row["value1"] > 0 or row["value2"] > 0:

            final.loc[index,"ID"] = row["ID"]
            final.loc[index,"time_value1"] = row["value1"]
            final.loc[index,"value1"] = row["value1"]
            final.loc[index,"time_value2"] = row["value2"]
            final.loc[index,"value2"] = row["value2"]

            break

data.sort_values("yyyymm").groupby("ID").apply(lambda x: findTimes(x))

谷歌群发和熊猫。谢谢你的回答!我尝试过groupby,但它不能帮助我完成更复杂的逻辑,即保存特定行并获取唯一ID(数据中的“ID”变量),你能接受它作为正确答案吗?侧面有一个勾号按钮,您可以按ID和YYYYMM进行分组]
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3,3],
                'yyyymm':[201501,201502,201503,201506,201507,201508,201509,201503,201504,201505],
                'value1':[0,1,3,0,0,1,0,0,0,0],
                'value2':[123,113,115,0,0,115,0,0,0,0]})
import pandas as pd

data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3,3],
                'yyyymm':[201501,201502,201503,201506,201507,201508,201509,201503,201504,201505],
                'value1':[0,1,3,0,0,1,0,0,0,0],
                'value2':[123,113,115,0,0,115,0,0,0,0]})

final = pd.DataFrame(columns=["ID", "time_value1", "value1", "time_value2", "value2"])

def findTimes(df):
    for index, row in df.iterrows():
        if row["value1"] > 0 or row["value2"] > 0:

            final.loc[index,"ID"] = row["ID"]
            final.loc[index,"time_value1"] = row["value1"]
            final.loc[index,"value1"] = row["value1"]
            final.loc[index,"time_value2"] = row["value2"]
            final.loc[index,"value2"] = row["value2"]

            break

data.sort_values("yyyymm").groupby("ID").apply(lambda x: findTimes(x))