Python 在缺少第一个和最后一个值的情况下，使用熊猫对两个方向进行插值？_Python_Pandas_Interpolation

Python 在缺少第一个和最后一个值的情况下，使用熊猫对两个方向进行插值？

python pandas

Python 在缺少第一个和最后一个值的情况下，使用熊猫对两个方向进行插值？,python,pandas,interpolation,Python,Pandas,Interpolation,我在尝试通过pandas插值电力需求数据时遇到了一个小问题。我对我国20202025203035年和2040年的年用电量进行了预测。我需要做的是将这些预测向后延至2017年，再向前延至2050年。在下面的代码中，您可以看到我的初始数据帧 Year Demand1 (TWh) Demand2 (TWh) Demand3 (TWh) 2017 NaN NaN NaN 2018 NaN

我在尝试通过pandas插值电力需求数据时遇到了一个小问题。我对我国20202025203035年和2040年的年用电量进行了预测。我需要做的是将这些预测向后延至2017年，再向前延至2050年。在下面的代码中，您可以看到我的初始数据帧

Year    Demand1 (TWh)   Demand2 (TWh)   Demand3 (TWh) 
2017     NaN                  NaN          NaN
2018     NaN                  NaN          NaN  
2019     NaN                  NaN          NaN  
2020    305.0                305.0        305.0
2021    NaN                   NaN          NaN  
2022    NaN                   NaN          NaN  
2023    NaN                   NaN          NaN
2024    NaN                   NaN          NaN  
2025    366.0                 370.0        373.0    
2026    NaN                   NaN          NaN
......
2030.   427.0                440.0         450.000000
......
2035    485.0                507.0          527.000000
......
2040    545.0                591.0          636.000000

所以，基本上，我试图填充这些NaN值。然而，当我试图用下面的代码应用插值时，我找不到预期的结果

demand['Demand1 (TWh)'] = demand['Demand1 (TWh)'].interpolate(method="linear",limit_direction='both')
demand['Demand2 (TWh)'] = demand['Demand2 (TWh)'].interpolate(method="linear",limit_direction='both')
demand['Demand3 (TWh)'] = demand['Demand3 (TWh)'].interpolate(method="linear",limit_direction='both')
demand



Year    Demand1 (TWh)   Demand2 (TWh)   Demand3 (TWh)
2017    305.0             305.0         305.000000 
2018    305.0             305.0         305.000000
2019    305.0             305.0         305.000000 
2020    305.0             305.0         305.000000
2021    317.2             318.0         317.683429 
2022    329.4             331.0         330.825143
2023    341.6             344.0         344.425143 
2024    353.8             357.0         358.483429
2025    366.0             370.0         373.000000 
2026    378.2             384.0         387.974857
2027    390.4             398.0         403.408000 
2028    402.6             412.0         419.201143
2029    414.8             426.0         434.764571 
2030    427.0             440.0         450.000000
2031    438.6             453.4         464.907429 
2032    450.2             466.8         479.486857
2033    461.8             480.2         493.968000 
2034    473.4             493.6         509.729143
2035    485.0             507.0         527.000000 
2036    497.0             523.8         545.780571
2037    509.0             540.6         566.070857 
2038    521.0             557.4         587.870857
2039    533.0             574.2         611.180571 
2040    545.0             591.0         636.000000
2041    545.0             591.0         636.000000
2042    545.0             591.0         636.000000
2043    545.0             591.0         636.000000
2044    545.0             591.0         636.000000
2045    545.0             591.0         636.000000
2046    545.0             591.0         636.000000
2047    545.0             591.0         636.000000
2048    545.0             591.0         636.000000
2049    545.0             591.0         636.000000
2050    545.0             591.0         636.000000

但正如您所见，2017-2018-2019年的数值与2020年相同，而从2041年到2050年，所有数值都与2040年相同。我不明白为什么以及如何解决它。我将感谢任何帮助。谢谢

我认为这里的问题是，从2017年到2020年，2020年只有一组值需要插值，而从2040年到2050年，2040年只有一组值需要插值。如果您不希望这些值都相同，您必须接受模型的限制，或者在2017年或之前以及2050年或之后添加数据。

要推断这些范围，我们可以选择较低的水平；它有一个

fill\u值

参数，我们可以用

外推

提供：

from scipy.interpolate import interp1d

df = df.set_index("Year")

df.apply(lambda col: interp1d(*zip(*col.dropna().items()), 
                              fill_value="extrapolate")(col.index))

我们首先将

年份

设置为在插值计算中忽略它的索引，然后将

NaN

删除列的值传递给

interp1d

，这将返回一个函数。然后，我们立即使用列的索引调用此函数，即所有年份。那里的

*zip（*

将每列转换为scipy用于插值的两个索引和值数组。此过程在

apply

的每列中进行

获取（我复制了2017-2026年间的数据）：

嗨，下面的答案有没有？如果是这样的话，你可以考虑其中一个给别人发信号，如果你愿意的话，这个问题已经解决了。如果没有，请提供反馈，这样答案可能会被改善或完全消除；谢谢。

      Demand1 (TWh)  Demand2 (TWh)  Demand3 (TWh)
Year
2017          268.4          266.0          264.2
2018          280.6          279.0          277.8
2019          292.8          292.0          291.4
2020          305.0          305.0          305.0
2021          317.2          318.0          318.6
2022          329.4          331.0          332.2
2023          341.6          344.0          345.8
2024          353.8          357.0          359.4
2025          366.0          370.0          373.0
2026          378.2          383.0          386.6