Python 如何从字典列表（包括嵌套列表和按时间分组）高效地创建数据帧_Python_Pandas_Performance_Dictionary_Nested

Python 如何从字典列表（包括嵌套列表和按时间分组）高效地创建数据帧

python pandas performance dictionary

Python 如何从字典列表（包括嵌套列表和按时间分组）高效地创建数据帧,python,pandas,performance,dictionary,nested,Python,Pandas,Performance,Dictionary,Nested,我目前在从嵌套字典高效地创建大数据帧方面遇到了问题。我知道很多人已经问过这个问题，但大多数情况下，解决方案都使用嵌套for循环，在我的例子中，这是非常低效的。也许我只是做错了。我对蟒蛇和熊猫很陌生，所以任何帮助都将不胜感激无论如何，我的初始数据结构如下所示： documents = [ { date: '2019-01-01', data: [

我目前在从嵌套字典高效地创建大数据帧方面遇到了问题。我知道很多人已经问过这个问题，但大多数情况下，解决方案都使用嵌套for循环，在我的例子中，这是非常低效的。也许我只是做错了。我对蟒蛇和熊猫很陌生，所以任何帮助都将不胜感激

无论如何，我的初始数据结构如下所示：

documents = [
               {
                  date: '2019-01-01', data: [
                                               {time: '08:32', boxId: 153}, 
                                               {time: '08:48', boxId: 323}, ...
                                            ]
               },
               {
                   date: '2019-01-02', data: [...]
               }
            ]

我需要为每个boxId聚合数据。这意味着我必须统计每个id的所有条目，并每小时对它们进行分组。像这样：

time              153    152    323
2019-01-01 09:00  2.0    3.0    6.0
2019-01-01 10:00  7.0    5.0    4.0
2019-01-01 11:00  1.0    0.0    8.0
      .
      .
      .
2020-01-01 00:00  3.0    1.0    5.0

我目前的解决方案非常缓慢。我就是这样做的：

def format_to_df(documents, grouper_freq):
   df = pd.DataFrame.from_records(
         dict({'date': doc['date'].strftime('%Y-%m-%d')}, **entry) 
         for doc in documents for entry in doc['data']
   )

   df['time'] = df[['date', 'time']].apply(lambda x: pd.to_datetime(' '.join(x)), axis=1)

   df = df.groupby([pd.Grouper(key='time', freq=grouper_freq), 'boxId'])
          .size()
          .reset_index()

   df['time'] = df['time'].astype(str)

   df = df.pivot(index='time', values=0, columns='boxId')
          .rename_axis(None, axis=1)
   return df.fillna(0)

这需要很长时间，用我当前的数据集生成大约3500行。我猜这是因为我访问df['time']的方式不好，对吧？另外，我认为轴是完全没有必要的，但除此之外，我得到了

time              boxId  0
2019-01-01 09:00  323    5.0 
2019-01-01 10:00  153    2.0
2019-01-01 11:00  153    1.0
2019-01-01 11:00  152    3.0
      .
      .
      .
2020-01-01 00:00  323    8.0

这是我不想要的

那么我如何改进我的代码呢？我怎样才能聪明地使用熊猫