Python中的组处理
我的问题围绕着在如下外观的数据框架中应用逻辑: ID yyyymm value1 value2 1 201501 0 123 1 201502 1 113 1 201503 3 115 2 201506 0 0 2 201507 0 0 2 201508 1 115 2 201509 0 0 3 201503 0 0 3 201504 0 0 3 201505 0 0Python中的组处理,python,pandas,Python,Pandas,我的问题围绕着在如下外观的数据框架中应用逻辑: ID yyyymm value1 value2 1 201501 0 123 1 201502 1 113 1 201503 3 115 2 201506 0 0 2 201507 0 0 2 201508 1
非常感谢您抽出时间 Hi您正在寻找的方法是
groupby
这样做只是
dodata.grouby(['ID','yyyyymm']).sum()
这将按yyyymm
列对数据进行分组,并对每个组运行sum
干杯,这将按日期排序,按ID分组,并找到value1或value2>0的行,将该行保存到另一个数据框中,然后移动到下一个ID组。如果希望每个组ID保存超过1次,只需删除中断 我不知道在最终数据帧的“time\u value1”或“time\u value2”列中需要什么值,但您可以轻松地将该变量赋值编辑为您想要的任何值
import pandas as pd
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3,3],
'yyyymm':[201501,201502,201503,201506,201507,201508,201509,201503,201504,201505],
'value1':[0,1,3,0,0,1,0,0,0,0],
'value2':[123,113,115,0,0,115,0,0,0,0]})
final = pd.DataFrame(columns=["ID", "time_value1", "value1", "time_value2", "value2"])
def findTimes(df):
for index, row in df.iterrows():
if row["value1"] > 0 or row["value2"] > 0:
final.loc[index,"ID"] = row["ID"]
final.loc[index,"time_value1"] = row["value1"]
final.loc[index,"value1"] = row["value1"]
final.loc[index,"time_value2"] = row["value2"]
final.loc[index,"value2"] = row["value2"]
break
data.sort_values("yyyymm").groupby("ID").apply(lambda x: findTimes(x))
谷歌群发和熊猫。谢谢你的回答!我尝试过groupby,但它不能帮助我完成更复杂的逻辑,即保存特定行并获取唯一ID(数据中的“ID”变量),你能接受它作为正确答案吗?侧面有一个勾号按钮,您可以按ID和YYYYMM进行分组]
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3,3],
'yyyymm':[201501,201502,201503,201506,201507,201508,201509,201503,201504,201505],
'value1':[0,1,3,0,0,1,0,0,0,0],
'value2':[123,113,115,0,0,115,0,0,0,0]})
import pandas as pd
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3,3],
'yyyymm':[201501,201502,201503,201506,201507,201508,201509,201503,201504,201505],
'value1':[0,1,3,0,0,1,0,0,0,0],
'value2':[123,113,115,0,0,115,0,0,0,0]})
final = pd.DataFrame(columns=["ID", "time_value1", "value1", "time_value2", "value2"])
def findTimes(df):
for index, row in df.iterrows():
if row["value1"] > 0 or row["value2"] > 0:
final.loc[index,"ID"] = row["ID"]
final.loc[index,"time_value1"] = row["value1"]
final.loc[index,"value1"] = row["value1"]
final.loc[index,"time_value2"] = row["value2"]
final.loc[index,"value2"] = row["value2"]
break
data.sort_values("yyyymm").groupby("ID").apply(lambda x: findTimes(x))