Python 还原频率表_Python_Pandas_Pivot Table

Python 还原频率表

python pandas

Python 还原频率表,python,pandas,pivot-table,Python,Pandas,Pivot Table,假设您有一个熊猫数据帧，其中包含如下频率信息： data = [[1,1,2,3], [1,2,3,5], [2,1,6,1], [2,2,2,4]] df = pd.DataFrame(data, columns=['id', 'time', 'CountX1', 'CountX2']) # id time CountX1 CountX2 # 0 1 1 2 3 # 1 1 2 3 5

假设您有一个熊猫数据帧，其中包含如下频率信息：

data = [[1,1,2,3],
        [1,2,3,5],
        [2,1,6,1],
        [2,2,2,4]]
df = pd.DataFrame(data, columns=['id', 'time', 'CountX1', 'CountX2'])

# id    time    CountX1     CountX2
# 0     1   1   2   3
# 1     1   2   3   5
# 2     2   1   6   1
# 3     2   2   2   4

id time variable
0   1   X1
0   1   X1
0   1   X2
0   1   X2
0   1   X2
1   1   X1
1   1   X1
1   1   X1
1   1   X2 ...  # 5x repeated
2   1   X1 ...  # 6x repeated
2   1   X2 ...  # 1x repeated
2   2   X1 ...  # 2x repeated
2   2   X2 ...  # 4x repeated

我正在寻找一个简单的命令（例如，使用

pd.pivot

或

pd.melt（）

）将这些频率还原为如下所示的频率：

data = [[1,1,2,3],
        [1,2,3,5],
        [2,1,6,1],
        [2,2,2,4]]
df = pd.DataFrame(data, columns=['id', 'time', 'CountX1', 'CountX2'])

# id    time    CountX1     CountX2
# 0     1   1   2   3
# 1     1   2   3   5
# 2     2   1   6   1
# 3     2   2   2   4

id time variable
0   1   X1
0   1   X1
0   1   X2
0   1   X2
0   1   X2
1   1   X1
1   1   X1
1   1   X1
1   1   X2 ...  # 5x repeated
2   1   X1 ...  # 6x repeated
2   1   X2 ...  # 1x repeated
2   2   X1 ...  # 2x repeated
2   2   X2 ...  # 4x repeated

你需要：

a = df.set_index(['id','time']).stack()
df = a.loc[a.index.repeat(a)].reset_index().rename(columns={'level_2':'a'}).drop(0, axis=1)
print(df)
    id  time        a
0    1     1  CountX1
1    1     1  CountX1
2    1     1  CountX2
3    1     1  CountX2
4    1     1  CountX2
5    1     2  CountX1
6    1     2  CountX1
7    1     2  CountX1
8    1     2  CountX2
9    1     2  CountX2
10   1     2  CountX2
11   1     2  CountX2
12   1     2  CountX2
13   2     1  CountX1
14   2     1  CountX1
15   2     1  CountX1
16   2     1  CountX1
17   2     1  CountX1
18   2     1  CountX1
19   2     1  CountX2
20   2     2  CountX1
21   2     2  CountX1
22   2     2  CountX2
23   2     2  CountX2
24   2     2  CountX2
25   2     2  CountX2

第一个解决方案首先被删除，因为不同的顺序：

a = df.melt(['id','time'])
df = (a.loc[a.index.repeat(a['value'])]
       .drop('value', 1)
       .sort_values(['id', 'time'])
       .reset_index(drop=True))

您可以使用

melt

重复
v = df.melt(['id', 'time'])
r = v.pop('value')

df = pd.DataFrame(
        v.values.repeat(r, axis=0),  columns=v.columns
)\
       .sort_values(['id', 'time'])\
       .reset_index(drop=True)

   id time variable
0   1    1  CountX1
1   1    1  CountX1
2   1    1  CountX2
3   1    1  CountX2
4   1    1  CountX2
5   1    2  CountX1
6   1    2  CountX1
7   1    2  CountX1
8   1    2  CountX2
9   1    2  CountX2
10  1    2  CountX2
11  1    2  CountX2
12  1    2  CountX2
13  2    1  CountX1
14  2    1  CountX1
15  2    1  CountX1
16  2    1  CountX1
17  2    1  CountX1
18  2    1  CountX1
19  2    1  CountX2
20  2    2  CountX1
21  2    2  CountX1
22  2    2  CountX2
23  2    2  CountX2
24  2    2  CountX2
25  2    2  CountX2

这将生成问题中描述的顺序

性能
df = pd.concat([df] * 100, ignore_index=True)




好的，添加了第一个解决方案。祝你好运在确定答案后，它是100个循环，每个循环最好3:6.84毫秒
。如果你不相信，你可以自己安排时间。：）@Cᴏʟᴅsᴘᴇᴇᴅ - 没问题，请将其添加到计时中；）对于tidyr>=0.8的情况，R代码将通过取消计数（df，freq），请参阅
# in this answer

%%timeit
v = df.melt(['id', 'time'])
r = v.pop('value')

pd.DataFrame(
        v.values.repeat(r, axis=0),  columns=v.columns
)\
       .sort_values(['id', 'time'])\
       .reset_index(drop=True)

100 loops, best of 3: 4.65 ms per loop