数据帧中值的Python线性插值_Python_Pandas

数据帧中值的Python线性插值

python pandas

数据帧中值的Python线性插值,python,pandas,Python,Pandas,我有一个python数据框架，其中包含2015年1月的每小时值，但有些小时缺少索引和两个值。理想情况下，包含名为“dates”和“values”的列的数据框中应该有744行。但是，它随机丢失了10个小时，因此只有734行。我想对当月缺失的小时数进行插值，以创建包含744个“日期”和744个“值”的所需数据帧编辑：我是python新手，因此我正在努力实现这个想法：创建一个数据框，第一列为2015年1月的所有小时数创建与第一个NaN大小相同的第二列用可用值填充第二列，因此缺少的小时数中有

我有一个python数据框架，其中包含2015年1月的每小时值，但有些小时缺少索引和两个值。理想情况下，包含名为“dates”和“values”的列的数据框中应该有744行。但是，它随机丢失了10个小时，因此只有734行。我想对当月缺失的小时数进行插值，以创建包含744个“日期”和744个“值”的所需数据帧

编辑：

我是python新手，因此我正在努力实现这个想法：

创建一个数据框，第一列为2015年1月的所有小时数
创建与第一个NaN大小相同的第二列
用可用值填充第二列，因此缺少的小时数中有N
使用熊猫插值函数

编辑2：

我在寻找代码片段的提示。根据下面的建议，我能够创建以下代码，但它无法填写月初的值，即1月1日1到5小时的值

import panda as pd
st_dt   =   '2015-01-01'
en_dt   =   '2015-01-31'
DateTimeHour =   pd.date_range( pd.Timestamp( st_dt ).date(), pd.Timestamp(    
en_dt ).date(), freq='H')
Pwr.index    =   pd.DatetimeIndex(Pwr.index) #Pwr is the original dataframe
Pwr          =   Pwr.reindex( DateTimeHour, fill_value = 0 )
Pwr2         =   pd.Series( Pwr.values )
Pwr2.interpolate( imit_direction='both' )

一般插值如下所示：

如果钥匙退出：

返回值

其他：

在所需关键点之前和之后找到第一个关键点，找到两个关键点之间的距离（可以使用所需的度量来定义），并取值的加权平均值，通过关键点之间的距离进行加权（close是更高的权重）

一般插值如下所示：

如果钥匙退出：

返回值

其他：

在所需关键点之前和之后找到第一个关键点，找到两个关键点之间的距离（可以使用所需的度量来定义），并取值的加权平均值，通过关键点之间的距离进行加权（close是更高的权重）

您想要什么，需要结合使用以下技术：

和pandas函数

pandas.Series.interpolate

。从你所说的，选择“线性”是你想要的

编辑：

如果在时间序列的最开始缺少数据点，插值将不起作用。一个想法是在插值后使用pandas.Series.fillna和“回填”。另外，当调用reindex时，不要将fill_值设置为0。您需要的是此技术的组合：

和pandas函数

pandas.Series.interpolate

。从你所说的，选择“线性”是你想要的

编辑：
如果在时间序列的最开始缺少数据点，插值将不起作用。一个想法是在插值后使用pandas.Series.fillna和“回填”。另外，在调用reindex时，不要将fill_值设置为0。reindex用于扩展数据帧，以获得每小时的频率。为缺少的值插入NaN：

df = df.asfreq('H')

然后使用基于DatetimeIndex和最近的非NaN值的（线性）插值替换NaN：

df = df.interpolate(method='time')

比如说,

import numpy as np
import pandas as pd

N, M = 744, 734
index = pd.date_range('2015-01-01', periods=N, freq='H')
idx = np.random.choice(np.arange(N), M, replace=False)
idx.sort()
index = index[idx]

# This creates a toy DataFrame with 734 non-null rows:
df = pd.DataFrame({'values': np.random.randint(10, size=(M,))}, index=index)

# This expands the DataFrame to 744 rows (10 null rows):
df = df.asfreq('H')

# This makes `df` have 744 non-null rows:
df = df.interpolate(method='time')

用于扩展数据帧，以获得每小时的频率。为缺少的值插入NaN：

df = df.asfreq('H')

然后使用基于DatetimeIndex和最近的非NaN值的（线性）插值替换NaN：

df = df.interpolate(method='time')

比如说,

import numpy as np
import pandas as pd

N, M = 744, 734
index = pd.date_range('2015-01-01', periods=N, freq='H')
idx = np.random.choice(np.arange(N), M, replace=False)
idx.sort()
index = index[idx]

# This creates a toy DataFrame with 734 non-null rows:
df = pd.DataFrame({'values': np.random.randint(10, size=(M,))}, index=index)

# This expands the DataFrame to 744 rows (10 null rows):
df = df.asfreq('H')

# This makes `df` have 744 non-null rows:
df = df.interpolate(method='time')

你到底想要什么？一个完全有效的解决方案？关于如何自己编写代码的提示？你心目中的想法似乎是合理的，你有什么特别的想法吗？你想看看你的想法是否是最好的吗？你到底想要什么？一个完全有效的解决方案？关于如何自己编写代码的提示？你心目中的想法似乎是合理的，你有什么特别的想法吗？你想看看你的想法是否是最好的吗？