Python 如何将pandas groupby()对象存储在具有不同索引的同一变量中
假设我有一个包含三列的数据框Python 如何将pandas groupby()对象存储在具有不同索引的同一变量中,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,假设我有一个包含三列的数据框df df= id date value A 02-04-2000 3 A 03-04-2000 8 B 04-04-2000 12 B 02-04-2000 7 C 03-04-2000 5 C 04-04-2000 2 我感兴趣的是根据df['id']列对数据进行分组,并将值存储在变量new中。new应该以这样的方式存储值:当我调用new[1]时,它应该返回与id=a对应的元素,离开id列,而new[2]应该返回与id=B对
df
df=
id date value
A 02-04-2000 3
A 03-04-2000 8
B 04-04-2000 12
B 02-04-2000 7
C 03-04-2000 5
C 04-04-2000 2
我感兴趣的是根据df['id']
列对数据进行分组,并将值存储在变量new
中。new
应该以这样的方式存储值:当我调用new[1]
时,它应该返回与id=a
对应的元素,离开id
列,而new[2]应该返回与id=B
对应的元素,依此类推
示例输出:
new[1]=
date value
02-04-2000 3
03-04-2000 8
new[2]=
date value
04-04-2000 12
02-04-2000 7
For all solutions与removeid
column by一起使用
如果可能,通过0,1,…
进行索引,输出为DataFrame
s的列表:
new = [g.drop('id', axis=1) for _, g in df.groupby('id')]
print (new[0])
date value
0 02-04-2000 3
1 03-04-2000 8
如果输出是DataFrame
s的字典,则以下是创建连续组:
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(df['id'].ne(df['id'].shift()).cumsum())}
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
new1 = {k: g.drop('id', axis=1) for k, g in df.groupby('id')}
print (new1['A'])
date value
0 02-04-2000 3
1 03-04-2000 8
print (df)
id date value
0 A 02-04-2000 3 <- 1group
1 A 03-04-2000 8 <- 1group
2 B 04-04-2000 12 <- 2group
3 A 02-04-2000 7 <- 3group
4 A 03-04-2000 5 <- 3group
5 C 04-04-2000 2 <- 4group
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(df['id'].ne(df['id'].shift()).cumsum())}
#first group
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
#fourth group
print (new[3])
date value
3 02-04-2000 7
4 03-04-2000 5
类似的解决方案(无连续组):
按协同组分组我尝试在另一个数据中解释:
print (df)
id date value
0 A 02-04-2000 3
1 A 03-04-2000 8
2 B 04-04-2000 12
3 A 02-04-2000 7
4 A 03-04-2000 5
5 C 04-04-2000 2
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(pd.factorize(df['id'])[0]+1)}
#all A rows is first group
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
3 02-04-2000 7
4 03-04-2000 5
#all C rows is third group
print (new[3])
date value
5 04-04-2000 2
按连续组分组:
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(df['id'].ne(df['id'].shift()).cumsum())}
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
new1 = {k: g.drop('id', axis=1) for k, g in df.groupby('id')}
print (new1['A'])
date value
0 02-04-2000 3
1 03-04-2000 8
print (df)
id date value
0 A 02-04-2000 3 <- 1group
1 A 03-04-2000 8 <- 1group
2 B 04-04-2000 12 <- 2group
3 A 02-04-2000 7 <- 3group
4 A 03-04-2000 5 <- 3group
5 C 04-04-2000 2 <- 4group
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(df['id'].ne(df['id'].shift()).cumsum())}
#first group
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
#fourth group
print (new[3])
date value
3 02-04-2000 7
4 03-04-2000 5
打印(df)
id日期值
0 A 02-04-2000 3