Python 使用熊猫创建绘图,并直接显示与Matplotlib类似的输出
我运行了一个查询,该查询输出由日期字符串和计数组成的数据列表:Python 使用熊猫创建绘图,并直接显示与Matplotlib类似的输出,python,matplotlib,pandas,Python,Matplotlib,Pandas,我运行了一个查询,该查询输出由日期字符串和计数组成的数据列表: date_cnts = [(u'2014-06-27', 1), (u'2014-06-29', 3), (u'2014-06-30', 1), (u'2014-07-01', 1), (u'2014-07-02', 1), (u'2014-07-09', 1), (u'2014-07-10', 3), (u'2014-07-11', 1), (u'2014-07-12', 2), (u'2014-07-14',
date_cnts = [(u'2014-06-27', 1),
(u'2014-06-29', 3),
(u'2014-06-30', 1),
(u'2014-07-01', 1),
(u'2014-07-02', 1),
(u'2014-07-09', 1),
(u'2014-07-10', 3),
(u'2014-07-11', 1),
(u'2014-07-12', 2),
(u'2014-07-14', 1),
(u'2014-07-15', 2),
(u'2014-07-17', 3),
(u'2014-07-18', 1),
(u'2014-07-20', 1),
(u'2014-07-21', 1),
(u'2014-07-23', 2),
(u'2014-07-26', 2),
(u'2014-07-27', 2),
(u'2014-07-28', 7),
(u'2014-07-29', 3),
(u'2014-07-31', 2),
(u'2014-08-01', 1),
(u'2014-08-05', 4),
(u'2014-08-07', 2),
(u'2014-08-08', 1),
(u'2014-08-13', 1),
(u'2014-08-14', 3),
(u'2014-08-15', 1),
(u'2014-08-16', 6),
(u'2014-08-17', 1),
(u'2014-08-18', 1),
(u'2014-08-20', 1),
(u'2014-08-24', 1),
(u'2014-08-25', 3),
(u'2014-08-29', 1),
(u'2014-08-30', 1),
(u'2014-09-03', 3),
(u'2014-09-13', 1),
(u'2014-09-14', 1),
(u'2014-09-24', 3),
(u'2014-10-20', 1),
(u'2014-10-24', 1),
(u'2014-11-05', 3),
(u'2014-11-09', 1),
(u'2014-11-12', 1),
(u'2014-11-13', 1),
(u'2014-11-14', 1),
(u'2014-11-18', 1),
(u'2014-11-19', 4),
(u'2014-11-22', 1),
(u'2014-11-26', 3),
(u'2014-11-28', 3),
(u'2014-12-01', 2),
(u'2014-12-02', 2),
(u'2014-12-04', 2),
(u'2014-12-05', 1),
(u'2014-12-06', 5),
(u'2014-12-11', 1),
(u'2014-12-15', 10)]
请注意,此数据集中存在日期间隔,表示缺少的日期的值为0
我的工作(非熊猫)版本的代码如下所示:
from matplotlib import pyplot as plt
x_val = [datetime.strptime(x[0],'%Y-%m-%d') for x in date_cnts]
y_val = [x[1] for x in date_cnts]
plt.bar(x_val, y_val)
plt.grid(True)
plt.show()
这将输出此图像:
现在,如果我将查询结果转换为Panda的数据帧
Date Count
0 2014-06-27 1
1 2014-06-29 3
2 2014-06-30 1
3 2014-07-01 1
4 2014-07-02 1
5 2014-07-09 1
6 2014-07-10 3
7 2014-07-11 1
8 2014-07-12 2
9 2014-07-14 1
10 2014-07-15 2
11 2014-07-17 3
12 2014-07-18 1
13 2014-07-20 1
14 2014-07-21 1
15 2014-07-23 2
16 2014-07-26 2
17 2014-07-27 2
18 2014-07-28 7
19 2014-07-29 3
20 2014-07-31 2
21 2014-08-01 1
22 2014-08-05 4
23 2014-08-07 2
24 2014-08-08 1
25 2014-08-13 1
26 2014-08-14 3
27 2014-08-15 1
28 2014-08-16 6
29 2014-08-17 1
30 2014-08-18 1
31 2014-08-20 1
32 2014-08-24 1
33 2014-08-25 3
34 2014-08-29 1
35 2014-08-30 1
36 2014-09-03 3
37 2014-09-13 1
38 2014-09-14 1
39 2014-09-24 3
40 2014-10-20 1
41 2014-10-24 1
42 2014-11-05 3
43 2014-11-09 1
44 2014-11-12 1
45 2014-11-13 1
46 2014-11-14 1
47 2014-11-18 1
48 2014-11-19 4
49 2014-11-22 1
50 2014-11-26 3
51 2014-11-28 3
52 2014-12-01 2
53 2014-12-02 2
54 2014-12-04 2
55 2014-12-05 1
56 2014-12-06 5
57 2014-12-11 1
58 2014-12-15 10
并利用简单的熊猫包装画出:
plt.figure()
df.plot(kind='bar', grid=True, legend=False, x='Date', y=u'Count')
plt.show()
我得到这个结果。请注意,我丢失的天数不显示在此图表中
在数据框中不存在日期的情况下,如何读取间隙(和0
值)
我想利用Pandas的原因是利用它的一些其他功能(最重要的是,滚动平均值)。我编写了一个工作版本,可能不是最好的,但它可以完成这项工作。它的基础是将原始数据重新索引到一个数据框中,该数据框中包含一个每天的样本
import pandas as pd
import matplotlib.pyplot as plt
#%% make data
df = pd.DataFrame(date_cnts)
df.columns = ['Date', 'Count']
#%% make dataframe with everyday sampling
df.index = pd.to_datetime(df['Date'])
startdate = df.index[0]
enddate = df.index[-1]
df_new = df.reindex(pd.date_range(startdate, enddate, freq='1D'))
#%% plot the results
df_new['Count'].plot(kind='bar')
# decrease number of days
new_xticks = plt.xticks()[0][1:-1:10]
plt.xticks(new_xticks)
对于xticks的进一步格式,我建议您回答以下问题: