Python 还原频率表
假设您有一个熊猫数据帧,其中包含如下频率信息:Python 还原频率表,python,pandas,pivot-table,Python,Pandas,Pivot Table,假设您有一个熊猫数据帧,其中包含如下频率信息: data = [[1,1,2,3], [1,2,3,5], [2,1,6,1], [2,2,2,4]] df = pd.DataFrame(data, columns=['id', 'time', 'CountX1', 'CountX2']) # id time CountX1 CountX2 # 0 1 1 2 3 # 1 1 2 3 5
data = [[1,1,2,3],
[1,2,3,5],
[2,1,6,1],
[2,2,2,4]]
df = pd.DataFrame(data, columns=['id', 'time', 'CountX1', 'CountX2'])
# id time CountX1 CountX2
# 0 1 1 2 3
# 1 1 2 3 5
# 2 2 1 6 1
# 3 2 2 2 4
id time variable
0 1 X1
0 1 X1
0 1 X2
0 1 X2
0 1 X2
1 1 X1
1 1 X1
1 1 X1
1 1 X2 ... # 5x repeated
2 1 X1 ... # 6x repeated
2 1 X2 ... # 1x repeated
2 2 X1 ... # 2x repeated
2 2 X2 ... # 4x repeated
我正在寻找一个简单的命令(例如,使用pd.pivot
或pd.melt()
)将这些频率还原为如下所示的频率:
data = [[1,1,2,3],
[1,2,3,5],
[2,1,6,1],
[2,2,2,4]]
df = pd.DataFrame(data, columns=['id', 'time', 'CountX1', 'CountX2'])
# id time CountX1 CountX2
# 0 1 1 2 3
# 1 1 2 3 5
# 2 2 1 6 1
# 3 2 2 2 4
id time variable
0 1 X1
0 1 X1
0 1 X2
0 1 X2
0 1 X2
1 1 X1
1 1 X1
1 1 X1
1 1 X2 ... # 5x repeated
2 1 X1 ... # 6x repeated
2 1 X2 ... # 1x repeated
2 2 X1 ... # 2x repeated
2 2 X2 ... # 4x repeated
你需要:
a = df.set_index(['id','time']).stack()
df = a.loc[a.index.repeat(a)].reset_index().rename(columns={'level_2':'a'}).drop(0, axis=1)
print(df)
id time a
0 1 1 CountX1
1 1 1 CountX1
2 1 1 CountX2
3 1 1 CountX2
4 1 1 CountX2
5 1 2 CountX1
6 1 2 CountX1
7 1 2 CountX1
8 1 2 CountX2
9 1 2 CountX2
10 1 2 CountX2
11 1 2 CountX2
12 1 2 CountX2
13 2 1 CountX1
14 2 1 CountX1
15 2 1 CountX1
16 2 1 CountX1
17 2 1 CountX1
18 2 1 CountX1
19 2 1 CountX2
20 2 2 CountX1
21 2 2 CountX1
22 2 2 CountX2
23 2 2 CountX2
24 2 2 CountX2
25 2 2 CountX2
第一个解决方案首先被删除,因为不同的顺序:
a = df.melt(['id','time'])
df = (a.loc[a.index.repeat(a['value'])]
.drop('value', 1)
.sort_values(['id', 'time'])
.reset_index(drop=True))
您可以使用
melt
+重复
v = df.melt(['id', 'time'])
r = v.pop('value')
df = pd.DataFrame(
v.values.repeat(r, axis=0), columns=v.columns
)\
.sort_values(['id', 'time'])\
.reset_index(drop=True)
id time variable
0 1 1 CountX1
1 1 1 CountX1
2 1 1 CountX2
3 1 1 CountX2
4 1 1 CountX2
5 1 2 CountX1
6 1 2 CountX1
7 1 2 CountX1
8 1 2 CountX2
9 1 2 CountX2
10 1 2 CountX2
11 1 2 CountX2
12 1 2 CountX2
13 2 1 CountX1
14 2 1 CountX1
15 2 1 CountX1
16 2 1 CountX1
17 2 1 CountX1
18 2 1 CountX1
19 2 1 CountX2
20 2 2 CountX1
21 2 2 CountX1
22 2 2 CountX2
23 2 2 CountX2
24 2 2 CountX2
25 2 2 CountX2
这将生成问题中描述的顺序
性能
df = pd.concat([df] * 100, ignore_index=True)
好的,添加了第一个解决方案。祝你好运在确定答案后,它是100个循环,每个循环最好3:6.84毫秒
。如果你不相信,你可以自己安排时间。:)@Cᴏʟᴅsᴘᴇᴇᴅ - 没问题,请将其添加到计时中;)对于tidyr>=0.8的情况,R代码将通过取消计数(df,freq)
,请参阅
# in this answer
%%timeit
v = df.melt(['id', 'time'])
r = v.pop('value')
pd.DataFrame(
v.values.repeat(r, axis=0), columns=v.columns
)\
.sort_values(['id', 'time'])\
.reset_index(drop=True)
100 loops, best of 3: 4.65 ms per loop