Pandas 重复数据框中的行,并在列值中进行顺序更改
我想用正向填充按时间顺序重新设置df中的行 原始df:Pandas 重复数据框中的行,并在列值中进行顺序更改,pandas,Pandas,我想用正向填充按时间顺序重新设置df中的行 原始df: A B C Year 0 ABC 0 A 1950 1 CDE 1 A 1950 2 XYZ 1 B 1954 3 123 1 C 1954 4 X12 1 B 1956 5 123 1 D 1956 6 124 1 D 1956 所需df: A B C Year 0 ABC 0 A 1950 1 CDE 1 A
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 XYZ 1 B 1954
3 123 1 C 1954
4 X12 1 B 1956
5 123 1 D 1956
6 124 1 D 1956
所需df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
我曾尝试将年份列转换为日期时间,并使用带正向填充的按年份重新采样。
但这不起作用,因为如果按年份重新采样,则重新采样每年只给出一行
df.resample('YS').first().ffill().reset_index()
所需df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
您可以尝试以下方法:
df_out = df.set_index([pd.to_datetime(df['Year'], format='%Y'),'A','B','C'])\
.unstack([1,2,3]).resample('A').ffill()\
.stack([1,2,3]).reset_index([1,2,3])
df_out = df_out.assign(Year=pd.to_datetime(df_out.index).year).reset_index(drop=True)
df_out
输出:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 123 1 C 1954
9 XYZ 1 B 1954
10 123 1 C 1955
11 XYZ 1 B 1955
12 123 1 D 1956
13 X12 1 B 1956
我觉得这是个问题
我采取了一种不同的方法,旋转和熔化。。 好像在工作。。有人看到问题了吗
data = {'year': ['2000', '2000', '2005', '2005', '2007', '2007', '2007', '2009'],
'country':['UK', 'US', 'FR','US','UK','FR','US','UK'],
'sales': [10, 21, 20, 10,12,20, 10,12],
'rep': ['john', 'john', 'claire','claire', 'kyle','kyle','kyle','amy']
}
df=pd.DataFrame(data)
year country sales rep
0 2000 UK 10 john
1 2000 US 21 john
2 2005 FR 20 claire
3 2005 US 10 claire
4 2007 UK 12 kyle
5 2007 FR 20 kyle
6 2007 US 10 kyle
7 2009 UK 12 amy
首先做一个旋转
dfp=pd.pivot_table(df,index=['country','rep'],values=['sales'],columns=['year']).fillna(0)
dfp=dfp.xs('sales', axis=1, drop_level=True)
year 2000 2005 2007 2009
country rep
FR claire 0.0 20.0 0.0 0.0
kyle 0.0 0.0 20.0 0.0
UK amy 0.0 0.0 0.0 12.0
john 10.0 0.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0
US claire 0.0 10.0 0.0 0.0
john 21.0 0.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0
然后用一点逻辑来复制这些列
cols=dfp.columns.astype(int).values
dft=dfp.copy()
i=0
for col in cols :
if col != cols[-1]:
for newcol in range(col+1,cols[i+1]):
dft[str(newcol)]=dft[str(col)]
i+=1
year 2000 2005 2007 2009 2001 2002 2003 2004 2006 2008
country rep
FR claire 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0 0.0
kyle 0.0 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0
UK amy 0.0 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0
john 10.0 0.0 0.0 0.0 10.0 10.0 10.0 10.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0 12.0
US claire 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0
john 21.0 0.0 0.0 0.0 21.0 21.0 21.0 21.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0
然后做了一次融化,把它们恢复到原来的格式
dfm=dft.reset_index()
dfm=dfm.melt(id_vars=['country','rep'],value_vars=dfm.columns.values[2:],var_name='Year',value_name='sales')
dfm=dfm.loc[dfm.sales>0].reset_index(drop='True')
country rep Year sales
0 UK john 2000 10.0
1 US john 2000 21.0
2 FR claire 2005 20.0
3 US claire 2005 10.0
4 FR kyle 2007 20.0
5 UK kyle 2007 12.0
6 US kyle 2007 10.0
7 UK amy 2009 12.0
8 UK john 2001 10.0
9 US john 2001 21.0
10 UK john 2002 10.0
11 US john 2002 21.0
12 UK john 2003 10.0
13 US john 2003 21.0
14 UK john 2004 10.0
15 US john 2004 21.0
16 FR claire 2006 20.0
17 US claire 2006 10.0
18 FR kyle 2008 20.0
19 UK kyle 2008 12.0
20 US kyle 2008 10.0
为什么1955年只有一行?>是一个拼写错误:-)刚刚更正了它。数据(年份)成对出现吗?不,它可以是任何数字