Python 如何使用pandas和matplotlib生成离散数据以传递到等高线图中?

Python 如何使用pandas和matplotlib生成离散数据以传递到等高线图中?,python,pandas,matplotlib,graph,Python,Pandas,Matplotlib,Graph,我有两组连续的数据,我想传递到等高线图中。x轴是时间,y轴是质量,z轴是频率(数据点出现的次数)。然而,大多数数据点并不相同,而是非常相似。因此,我认为最容易离散x轴和y轴 以下是我目前掌握的数据: 输入 将熊猫作为pd导入 df=pd.read\u excel('data.xlsx') df['日期].总目(5) df[‘质量’]头部(5) 输出 13 2003-05-09 14 2003-09-09 15 2010-01-18 16 2010-11-21 17 2012-0

我有两组连续的数据,我想传递到等高线图中。x轴是时间,y轴是质量,z轴是频率(数据点出现的次数)。然而,大多数数据点并不相同,而是非常相似。因此,我认为最容易离散x轴和y轴

以下是我目前掌握的数据:

输入

将熊猫作为pd导入
df=pd.read\u excel('data.xlsx')
df['日期].总目(5)
df[‘质量’]头部(5)
输出

13 2003-05-09
14   2003-09-09
15   2010-01-18
16   2010-11-21
17   2012-06-29
名称:日期,数据类型:datetime64[ns]
13    2500.0
14    3500.0
15    4000.0
16    4500.0
17    5000.0
名称:Mass,数据类型:float64
我想转换数据,使其在一年内对数据点进行分组(例如:2003年采集的所有数据点),并对不同质量级别的数据点进行分组(例如:3000-4000 kg之间的所有数据点)。接下来,代码将计算每个块中有多少数据点,并将其作为z轴传递


理想情况下,我还希望能够调整切片的级别。例如:每100kg分组一次,而不是每1000kg分组一次,或者传递一个不均匀分布的自定义级别列表。我该怎么做呢?

我想你要找的函数是pd.cut

import pandas as pd
import numpy as np
import datetime

n = 10
scale = 1e3
Min = 0
Max = 1e4

np.random.seed(6)

Start = datetime.datetime(2000, 1, 1)
Dates = np.array([base + datetime.timedelta(days=i*180) for i in range(n)])
Mass = np.random.rand(n)*10000
df = pd.DataFrame(index = Dates, data = {'Mass':Mass})

print(df)
给你:

                   Mass
2000-01-01  8928.601514
2000-06-29  3319.798053
2000-12-26  8212.291231
2001-06-24   416.966257
2001-12-21  1076.566799
2002-06-19  5950.520642
2002-12-16  5298.173622
2003-06-14  4188.074286
2003-12-11  3354.078493
2004-06-08  6225.194322
如果您想按(比如)1000对体量进行分组,或者实现自己的自定义垃圾箱,您可以执行以下操作:

Bins,Labels=np.arange(Min,Max+.1,scale),(np.arange(Min,Max,scale))+(scale)/2
EqualBins = pd.cut(df['Mass'],bins=Bins,labels=Labels)
df.insert(1,'Equal Bins',EqualBins)

Bins,Labels=[0,1000,5000,10000],['Small','Medium','Big']
CustomBins = pd.cut(df['Mass'],bins=Bins,labels=Labels)
df.insert(2,'Custom Bins',CustomBins)
如果您只想显示年份、月份等,则非常简单:

df['Year'] = df.index.year
df['Month'] = df.index.month
但如果您愿意,也可以自定义日期范围:

Bins=[datetime.datetime(1999, 12, 31),datetime.datetime(2000, 9, 1),
      datetime.datetime(2002, 1, 1),datetime.datetime(2010, 9, 1)]


Labels = ['Early','Middle','Late']
CustomDateBins = pd.cut(df.index,bins=Bins,labels=Labels)
df.insert(3,'Custom Date Bins',CustomDateBins)

print(df)
这会产生类似于您想要的结果:

                   Mass Equal Bins Custom Bins Custom Date Bins  Year  Month
2000-01-01  8928.601514     8500.0         Big            Early  2000      1
2000-06-29  3319.798053     3500.0      Medium            Early  2000      6
2000-12-26  8212.291231     8500.0         Big           Middle  2000     12
2001-06-24   416.966257      500.0       Small           Middle  2001      6
2001-12-21  1076.566799     1500.0      Medium           Middle  2001     12
2002-06-19  5950.520642     5500.0         Big             Late  2002      6
2002-12-16  5298.173622     5500.0         Big             Late  2002     12
2003-06-14  4188.074286     4500.0      Medium             Late  2003      6
2003-12-11  3354.078493     3500.0      Medium             Late  2003     12
2004-06-08  6225.194322     6500.0         Big             Late  2004      6
.groupby函数可能对您也很感兴趣:

yeargroup = df.groupby(df.index.year).mean()
massgroup = df.groupby(df['Equal Bins']).count()
print(yeargroup)
print(massgroup)

             Mass    Year     Month
2000  6820.230266  2000.0  6.333333
2001   746.766528  2001.0  9.000000
2002  5624.347132  2002.0  9.000000
2003  3771.076389  2003.0  9.000000
2004  6225.194322  2004.0  6.000000
            Mass  Custom Bins  Custom Date Bins  Year  Month
Equal Bins                                                  
500.0          1            1                 1     1      1
1500.0         1            1                 1     1      1
2500.0         0            0                 0     0      0
3500.0         2            2                 2     2      2
4500.0         1            1                 1     1      1
5500.0         2            2                 2     2      2
6500.0         1            1                 1     1      1
7500.0         0            0                 0     0      0
8500.0         2            2                 2     2      2
9500.0         0            0                 0     0      0

我想你要找的功能是pd切割

import pandas as pd
import numpy as np
import datetime

n = 10
scale = 1e3
Min = 0
Max = 1e4

np.random.seed(6)

Start = datetime.datetime(2000, 1, 1)
Dates = np.array([base + datetime.timedelta(days=i*180) for i in range(n)])
Mass = np.random.rand(n)*10000
df = pd.DataFrame(index = Dates, data = {'Mass':Mass})

print(df)
给你:

                   Mass
2000-01-01  8928.601514
2000-06-29  3319.798053
2000-12-26  8212.291231
2001-06-24   416.966257
2001-12-21  1076.566799
2002-06-19  5950.520642
2002-12-16  5298.173622
2003-06-14  4188.074286
2003-12-11  3354.078493
2004-06-08  6225.194322
如果您想按(比如)1000对体量进行分组,或者实现自己的自定义垃圾箱,您可以执行以下操作:

Bins,Labels=np.arange(Min,Max+.1,scale),(np.arange(Min,Max,scale))+(scale)/2
EqualBins = pd.cut(df['Mass'],bins=Bins,labels=Labels)
df.insert(1,'Equal Bins',EqualBins)

Bins,Labels=[0,1000,5000,10000],['Small','Medium','Big']
CustomBins = pd.cut(df['Mass'],bins=Bins,labels=Labels)
df.insert(2,'Custom Bins',CustomBins)
如果您只想显示年份、月份等,则非常简单:

df['Year'] = df.index.year
df['Month'] = df.index.month
但如果您愿意,也可以自定义日期范围:

Bins=[datetime.datetime(1999, 12, 31),datetime.datetime(2000, 9, 1),
      datetime.datetime(2002, 1, 1),datetime.datetime(2010, 9, 1)]


Labels = ['Early','Middle','Late']
CustomDateBins = pd.cut(df.index,bins=Bins,labels=Labels)
df.insert(3,'Custom Date Bins',CustomDateBins)

print(df)
这会产生类似于您想要的结果:

                   Mass Equal Bins Custom Bins Custom Date Bins  Year  Month
2000-01-01  8928.601514     8500.0         Big            Early  2000      1
2000-06-29  3319.798053     3500.0      Medium            Early  2000      6
2000-12-26  8212.291231     8500.0         Big           Middle  2000     12
2001-06-24   416.966257      500.0       Small           Middle  2001      6
2001-12-21  1076.566799     1500.0      Medium           Middle  2001     12
2002-06-19  5950.520642     5500.0         Big             Late  2002      6
2002-12-16  5298.173622     5500.0         Big             Late  2002     12
2003-06-14  4188.074286     4500.0      Medium             Late  2003      6
2003-12-11  3354.078493     3500.0      Medium             Late  2003     12
2004-06-08  6225.194322     6500.0         Big             Late  2004      6
.groupby函数可能对您也很感兴趣:

yeargroup = df.groupby(df.index.year).mean()
massgroup = df.groupby(df['Equal Bins']).count()
print(yeargroup)
print(massgroup)

             Mass    Year     Month
2000  6820.230266  2000.0  6.333333
2001   746.766528  2001.0  9.000000
2002  5624.347132  2002.0  9.000000
2003  3771.076389  2003.0  9.000000
2004  6225.194322  2004.0  6.000000
            Mass  Custom Bins  Custom Date Bins  Year  Month
Equal Bins                                                  
500.0          1            1                 1     1      1
1500.0         1            1                 1     1      1
2500.0         0            0                 0     0      0
3500.0         2            2                 2     2      2
4500.0         1            1                 1     1      1
5500.0         2            2                 2     2      2
6500.0         1            1                 1     1      1
7500.0         0            0                 0     0      0
8500.0         2            2                 2     2      2
9500.0         0            0                 0     0      0

查看
np.historogram2d
。查看
np.historogram2d