Python 从存储了多个日期时间的数据中每小时合计数据的一种方法
下面是存储日期时间和分数的数据Python 从存储了多个日期时间的数据中每小时合计数据的一种方法,python,pandas,Python,Pandas,下面是存储日期时间和分数的数据 data = [ {'datetime': '2016-07-16 01:00:00+00:00', 'score': 100}, {'datetime': '2016-07-16 01:00:00+00:00', 'score': 314}, {'datetime': '2016-07-16 01:00:00+00:00', 'score': 1312}, {'datetime': '2016-07-16 01:30:00+00:00', 'score': 13
data = [
{'datetime': '2016-07-16 01:00:00+00:00', 'score': 100},
{'datetime': '2016-07-16 01:00:00+00:00', 'score': 314},
{'datetime': '2016-07-16 01:00:00+00:00', 'score': 1312},
{'datetime': '2016-07-16 01:30:00+00:00', 'score': 135},
{'datetime': '2016-07-16 01:30:00+00:00', 'score': 594},
{'datetime': '2016-07-16 01:30:00+00:00', 'score': 542},
{'datetime': '2016-07-16 02:00:00+00:00', 'score': 1431},
{'datetime': '2016-07-16 02:00:00+00:00', 'score': 431},
{'datetime': '2016-07-16 02:00:00+00:00', 'score': 89},
{'datetime': '2016-07-16 02:30:00+00:00', 'score': 1340},
{'datetime': '2016-07-16 02:30:00+00:00', 'score': 433},
{'datetime': '2016-07-16 02:30:00+00:00', 'score': 594},
{'datetime': '2016-07-17 01:00:00+00:00', 'score': 100},
{'datetime': '2016-07-17 01:00:00+00:00', 'score': 594},
{'datetime': '2016-07-17 01:00:00+00:00', 'score': 100},
{'datetime': '2016-07-17 01:30:00+00:00', 'score': 594},
{'datetime': '2016-07-17 01:30:00+00:00', 'score': 100},
{'datetime': '2016-07-17 01:30:00+00:00', 'score': 600},
{'datetime': '2016-07-17 02:00:00+00:00', 'score': 500},
{'datetime': '2016-07-17 02:00:00+00:00', 'score': 400},
{'datetime': '2016-07-17 02:00:00+00:00', 'score': 300},
{'datetime': '2016-07-17 02:30:00+00:00', 'score': 400},
{'datetime': '2016-07-17 02:30:00+00:00', 'score': 900},
{'datetime': '2016-07-17 02:30:00+00:00', 'score': 1100},
{'datetime': '2016-07-18 01:00:00+00:00', 'score': 140},
{'datetime': '2016-07-18 01:00:00+00:00', 'score': 150},
{'datetime': '2016-07-18 01:00:00+00:00', 'score': 160},
{'datetime': '2016-07-18 01:30:00+00:00', 'score': 170},
{'datetime': '2016-07-18 01:30:00+00:00', 'score': 180},
{'datetime': '2016-07-18 01:30:00+00:00', 'score': 190},
{'datetime': '2016-07-18 02:00:00+00:00', 'score': 200},
{'datetime': '2016-07-18 02:00:00+00:00', 'score': 120},
{'datetime': '2016-07-18 02:00:00+00:00', 'score': 190},
{'datetime': '2016-07-18 02:30:00+00:00', 'score': 500},
{'datetime': '2016-07-18 02:30:00+00:00', 'score': 600},
{'datetime': '2016-07-18 02:30:00+00:00', 'score': 700},
]
我想根据这些数据汇编每小时的平均分数。摘要结果假设以下数据。(分数值为样本)
有没有像这样编译的好方法?
我认为使用熊猫很好,但我想不出具体的代码。
我想告诉你。
熊猫解决方案:
#create DataFrame
df = pd.DataFrame(data)
#convert to datetimes
df['datetime'] = pd.to_datetime(df['datetime'])
#groupby by hours and aggregate mean
df = (df.groupby(df['datetime'].dt.strftime('%H:00').rename('hour'))['score']
.mean()
.reset_index(name='average_score'))
print (df)
hour average_score
0 01:00 337.500000
1 02:00 568.222222
#convert to list of dicionaries
summary = df.to_dict(orient='records')
print (summary)
[{'hour': '01:00', 'average_score': 337.5},
{'hour': '02:00', 'average_score': 568.2222222222222}]
熊猫解决方案:
#create DataFrame
df = pd.DataFrame(data)
#convert to datetimes
df['datetime'] = pd.to_datetime(df['datetime'])
#groupby by hours and aggregate mean
df = (df.groupby(df['datetime'].dt.strftime('%H:00').rename('hour'))['score']
.mean()
.reset_index(name='average_score'))
print (df)
hour average_score
0 01:00 337.500000
1 02:00 568.222222
#convert to list of dicionaries
summary = df.to_dict(orient='records')
print (summary)
[{'hour': '01:00', 'average_score': 337.5},
{'hour': '02:00', 'average_score': 568.2222222222222}]
一种方法是使用两位数的小时作为键的列表,可以直接从日期时间字符串中的固定位置提取。将按小时分组的分数累积到defaultdict中的列表中,然后在所有项目分组后计算平均值
from collections import defaultdict
from statistics import mean
d = defaultdict(list)
for item in data:
hour = item['datetime'][11:13]
d[hour].append(item['score'])
summary = [{'hour': '{}:00'.format(hour), 'average_score': mean(d[hour])} for hour in d]
from pprint import pprint
pprint(summary)
输出:
[{'average_score': 337.5, 'hour': '01:00'},
{'average_score': 568.2222222222222, 'hour': '02:00'}]
[{'average_score':337.5,'hour':'01:00'},
{‘平均分’:568.22222,‘小时’:‘02:00’}]
一种方法是使用两位数的小时作为键的列表,可以直接从日期时间字符串中的固定位置提取。将按小时分组的分数累积到defaultdict中的列表中,然后在所有项目分组后计算平均值
from collections import defaultdict
from statistics import mean
d = defaultdict(list)
for item in data:
hour = item['datetime'][11:13]
d[hour].append(item['score'])
summary = [{'hour': '{}:00'.format(hour), 'average_score': mean(d[hour])} for hour in d]
from pprint import pprint
pprint(summary)
输出:
[{'average_score': 337.5, 'hour': '01:00'},
{'average_score': 568.2222222222222, 'hour': '02:00'}]
[{'average_score':337.5,'hour':'01:00'},
{‘平均分’:568.22222,‘小时’:‘02:00’}]