Python 从3列数据帧创建矩阵（如双向表）_Python_Pandas_Performance_Numpy_Reshape

Python 从3列数据帧创建矩阵（如双向表）

python pandas performance numpy

Python 从3列数据帧创建矩阵（如双向表）,python,pandas,performance,numpy,reshape,Python,Pandas,Performance,Numpy,Reshape,我有一个这样的数据框 datetime id value 0 2021-02-21 15:43:00 154 0.102677 1 2021-02-21 15:57:00 215 0.843945 2 2021-02-21 00:31:00 126 0.402851 3 2021-02-21 16:38:00 61 0.138945 4 2021-02-21 05:11:00 124 0.865435 ..

我有一个这样的数据框

               datetime   id     value
0   2021-02-21 15:43:00  154  0.102677
1   2021-02-21 15:57:00  215  0.843945
2   2021-02-21 00:31:00  126  0.402851
3   2021-02-21 16:38:00   61  0.138945
4   2021-02-21 05:11:00  124  0.865435
..                  ...  ...       ...
115 2021-02-21 21:54:00  166  0.108299
116 2021-02-21 17:39:00  192  0.129267
117 2021-02-21 01:56:00  258  0.300448
118 2021-02-21 20:35:00  401  0.119043
119 2021-02-21 09:16:00  192  0.587173

我可以通过发布

import datetime
from numpy import random
#all minutes of the day, ordered, unique
d = pd.date_range("2021-02-21 00:00:00","2021-02-21 23:59:59", freq="1min")

d2 = pd.Series(d).sample(120,replace=True)
ids = random.randint(1,500,size=d2.shape[0])
df = pd.DataFrame({'datetime':d2,'id':ids,'value':random.random(size=d2.shape[0])})
df.reset_index(inplace=True,drop=True)

我想把它放在一个矩阵中，一个索引是一天中的分钟，另一个是id，这样我就有了

1440*唯一的（id）。shape[0]

请注意，即使数据帧中没有出现几分钟，输出矩阵仍然是1440

我可以这样做

               datetime   id     value
0   2021-02-21 15:43:00  154  0.102677
1   2021-02-21 15:57:00  215  0.843945
2   2021-02-21 00:31:00  126  0.402851
3   2021-02-21 16:38:00   61  0.138945
4   2021-02-21 05:11:00  124  0.865435
..                  ...  ...       ...
115 2021-02-21 21:54:00  166  0.108299
116 2021-02-21 17:39:00  192  0.129267
117 2021-02-21 01:56:00  258  0.300448
118 2021-02-21 20:35:00  401  0.119043
119 2021-02-21 09:16:00  192  0.587173

但这需要很长时间。我怎样才能做得更好

#all ids, unique
uniqueIds = df.id.unique()
idsN = ids.shape[0]
objectiveMatrix = np.zeros([1440,idsN])
mins = pd.date_range(start='2020-09-22 00:00', end='2020-09-23 00:00', closed=None, freq='1min')
for index, row in df.iterrows():
    a = np.where(row.id==uniqueIds)[0]
    b = np.where(row.datetime==d)[0]
    objectiveMatrix[b,a] = row.value

这就是所谓的支点。熊猫有

pivot

，

pivot\u表

，

为此设置索引/取消堆栈

。有关更多详细信息，请参阅。作为初学者，您可以尝试：

# this extract the time string
df['minute'] = df['datetime'].dt.strftime('%H-%M')

output = df.pivot_table(index='minute', columns='id', values='value')

很好，我希望是一种方式，但我不知道这个术语。真的很有帮助。好吧，现在我看得更清楚了，这和我想要的还有一个区别。我的数据框可能没有一天中每分钟的数据，但我希望矩阵是1440，所以，不管它是否被填充，每分钟都有一行。好的，明白了。我使用int-minute使它更简单，

df['minute']=（df['datetime'].dt.hour*60+df['datetime'].dt.minute）。astype（int）

。我和您一样，

output=df.pivot\u表（index='minute'，columns='id'，values='values'）

。最后，只需重新索引最终的“矩阵形”数据帧：

output\u full=output.reindex（np.arange（11441），fill\u value=0）

。