Python使用Kalman滤波器来改进仿真，但得到的结果更差_Python_Simulation_Weather_Kalman Filter

Python使用Kalman滤波器来改进仿真，但得到的结果更差

python

Python使用Kalman滤波器来改进仿真，但得到的结果更差,python,simulation,weather,kalman-filter,Python,Simulation,Weather,Kalman Filter,我对将卡尔曼滤波（KF）应用于以下预测问题时看到的行为有疑问。我已经包括了一个简单的代码示例目标：我想知道KF是否适合使用现在（t+24小时）获得的测量结果来改进前一天（t+24小时）的预测/模拟结果。目标是使预测尽可能接近测量值假设：我们假设度量是完美的（即，如果我们能够得到与度量完美匹配的预测，我们会很高兴）我们有一个测量变量（z，实际风速）和一个模拟变量（x，预测风速）模拟风速x由NWP（数值天气预报）软件使用各种气象数据（黑匣子到我）生成。每天生成模拟文件，每半小时包含一次数据

我对将卡尔曼滤波（KF）应用于以下预测问题时看到的行为有疑问。我已经包括了一个简单的代码示例

目标：我想知道KF是否适合使用现在（t+24小时）获得的测量结果来改进前一天（t+24小时）的预测/模拟结果。目标是使预测尽可能接近测量值

假设： 我们假设度量是完美的（即，如果我们能够得到与度量完美匹配的预测，我们会很高兴）

我们有一个测量变量（z，实际风速）和一个模拟变量（x，预测风速）

模拟风速x由NWP（数值天气预报）软件使用各种气象数据（黑匣子到我）生成。每天生成模拟文件，每半小时包含一次数据

我尝试使用我现在获得的测量值和现在使用标量卡尔曼滤波器生成的预测数据（t-24小时前生成）纠正t+24小时预测。作为参考，我使用了：

代码：

#! /usr/bin/python

import numpy as np
import pylab

import os


def main():

    # x = 336 data points of simulated wind speed for 7 days * 24 hour * 2 (every half an hour)
    # Imagine at time t, we will get a x_t fvalue or t+48 or a 24 hours later.
    x = load_x()

    # this is a list that will contain 336 data points of our corrected data
    x_sample_predict_list = []

    # z = 336 data points for 7 days * 24 hour * 2 of actual measured wind speed (every half an hour)
    z = load_z()

    # Here is the setup of the scalar kalman filter
    # reference: http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html
    # state transition matrix (we simply have a scalar)
    # what you need to multiply the last time's state to get the newest state
    # we get the x_t+1 = A * x_t, since we get the x_t+1 directly for simulation
    # we will have a = 1
    a = 1.0

    # observation matrix
    # what you need to multiply to the state, convert it to the same form as incoming measurement 
    # both state and measurements are wind speed, so set h = 1
    h = 1.0

    Q = 16.0    # expected process variance of predicted Wind Speed
    R = 9.0 # expected measurement variance of Wind Speed

    p_j = Q # process covariance is equal to the initial process covariance estimate

    # Kalman gain is equal to k = hp-_j / (hp-_j + R).  With perfect measurement
    # R = 0, k reduces to k=1/h which is 1
    k = 1.0

    # one week data
    # original R2 = 0.183
    # with delay = 6, R2 = 0.295
    # with delay = 12, R2 = 0.147   
    # with delay = 48, R2 = 0.075
    delay = 6 

    # Kalman loop
    for t, x_sample in enumerate(x):

        if t <= delay:          
            # for the first day of the forecast,
            # we don't have forecast data and measurement 
            # from a day before to do correction
            x_sample_predict = x_sample             
        else: # t > 48
            # for a priori estimate we take x_sample as is
            # x_sample = x^-_j = a x^-_j_1 + b u_j
            # Inside the NWP (numerical weather prediction, 
            # the x_sample should be on x_sample_j-1 (assumption)

            x_sample_predict_prior = a * x_sample

            # we use the measurement from t-delay (ie. could be a day ago)
            # and forecast data from t-delay, to produce a leading residual that can be used to
            # correct the forecast.
            residual = z[t-delay] - h * x_sample_predict_list[t-delay]


            p_j_prior = a**2 * p_j + Q

            k = h * p_j_prior / (h**2 * p_j_prior + R)

            # we update our prediction based on the residual
            x_sample_predict = x_sample_predict_prior + k * residual

            p_j = p_j_prior * (1 - h * k)

            #print k
            #print p_j_prior
            #print p_j
            #raw_input()

        x_sample_predict_list.append(x_sample_predict)

    # initial goodness of fit
    R2_val_initial = calculate_regression(x,z)
    R2_string_initial = "R2 initial: {0:10.3f}, ".format(R2_val_initial)    
    print R2_string_initial     # R2_val_initial = 0.183

    # final goodness of fit
    R2_val_final = calculate_regression(x_sample_predict_list,z)
    R2_string_final = "R2 final: {0:10.3f}, ".format(R2_val_final)  
    print R2_string_final       # R2_val_final = 0.117, which is worse


    timesteps = xrange(len(x))      
    pylab.plot(timesteps,x,'r-', timesteps,z,'b:', timesteps,x_sample_predict_list,'g--')
    pylab.xlabel('Time')
    pylab.ylabel('Wind Speed')
    pylab.title('Simulated Wind Speed vs Actual Wind Speed')
    pylab.legend(('predicted','measured','kalman'))
    pylab.show()


def calculate_regression(x, y):         
    R2 = 0  
    A = np.array( [x, np.ones(len(x))] )
    model, resid = np.linalg.lstsq(A.T, y)[:2]  
    R2_val = 1 - resid[0] / (y.size * y.var())          
    return R2_val

def load_x():
    return np.array([2, 3, 3, 5, 4, 4, 4, 5, 5, 6, 5, 7, 7, 7, 8, 8, 8, 9, 9, 10, 10, 10, 11, 11,
     11, 10, 8, 8, 8, 8, 6, 3, 4, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 6, 6, 7, 6, 8, 9, 10,
     12, 11, 10, 10, 10, 11, 11, 10, 8, 8, 9, 8, 9, 9, 9, 9, 8, 9, 8, 11, 11, 11, 12,
     12, 13, 13, 13, 13, 13, 13, 13, 14, 13, 13, 12, 13, 13, 12, 12, 13, 13, 12, 12, 
     11, 12, 12, 19, 18, 17, 15, 13, 14, 14, 14, 13, 12, 12, 12, 12, 11, 10, 10, 10, 
     10, 9, 9, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 7, 7, 8, 8, 8, 6, 5, 5, 
     5, 5, 5, 5, 6, 4, 4, 4, 6, 7, 8, 7, 7, 9, 10, 10, 9, 9, 8, 7, 5, 5, 5, 5, 5, 5, 
     5, 5, 6, 5, 5, 5, 4, 4, 6, 6, 7, 7, 7, 7, 6, 6, 5, 5, 4, 2, 2, 2, 1, 1, 1, 2, 3,
     13, 13, 12, 11, 10, 9, 10, 10, 8, 9, 8, 7, 5, 3, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6,
     7, 7, 7, 6, 6, 6, 7, 6, 6, 5, 4, 4, 3, 3, 3, 2, 2, 1, 5, 5, 3, 2, 1, 2, 6, 7, 
     7, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 
     7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 11, 11, 11, 11, 10, 10, 9, 10, 10, 10, 2, 2,
     2, 3, 1, 1, 3, 4, 5, 8, 9, 9, 9, 9, 8, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7,
     7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 7, 5, 5, 5, 5, 5, 6, 5])

def load_z():
    return np.array([3, 2, 1, 1, 1, 1, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 1, 2, 2, 2,
     2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 4, 4, 4, 5, 4, 4, 5, 5, 5, 6, 6,
     6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 8, 8, 8, 8, 8, 8, 9, 10, 9, 9, 10, 10, 9,
     9, 10, 9, 9, 10, 9, 8, 9, 9, 7, 7, 6, 7, 6, 6, 7, 7, 8, 8, 8, 8, 8, 8, 7, 6, 7,
     8, 8, 7, 8, 9, 9, 9, 9, 10, 9, 9, 9, 8, 8, 10, 9, 10, 10, 9, 9, 9, 10, 9, 8, 7, 
     7, 7, 7, 8, 7, 6, 5, 4, 3, 5, 3, 5, 4, 4, 4, 2, 4, 3, 2, 1, 1, 2, 1, 2, 1, 4, 4,
     4, 4, 4, 3, 3, 3, 1, 1, 1, 1, 2, 3, 3, 2, 3, 3, 3, 2, 2, 5, 4, 2, 5, 4, 1, 1, 1, 
     1, 1, 1, 1, 2, 2, 1, 1, 3, 3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 4, 4, 3, 3, 4, 4, 4, 4,
     4, 4, 5, 5, 5, 4, 3, 3, 3, 3, 3, 3, 3, 3, 1, 2, 2, 3, 3, 1, 2, 1, 1, 2, 4, 3, 1,
     1, 2, 0, 0, 0, 2, 1, 0, 0, 2, 3, 2, 4, 4, 3, 3, 4, 5, 5, 5, 4, 5, 4, 4, 4, 5, 5, 
     4, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4, 4, 5, 5, 5, 4, 5, 5, 5, 5, 6, 5, 5, 8, 9, 8, 9,
     9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 9, 10, 9, 8, 8, 9, 8, 9, 9, 10, 9, 9, 9,
     7, 7, 9, 8, 7, 6, 6, 5, 5, 5, 5, 3, 3, 3, 4, 6, 5, 5, 6, 5])

if __name__ == '__main__': main()  # this avoids executing main on import your_module

#/usr/bin/python
将numpy作为np导入
进口派拉布
导入操作系统
def main（）：
#x=336个7天*24小时*2（每半小时）模拟风速数据点
#想象一下，在时间t，我们将得到一个x_t f值或t+48或24小时后的值。
x=负载_x（）
#这是一个包含336个数据点的修正数据列表
x_样本_预测_列表=[]
#z=336个数据点，7天*24小时*2个实测风速（每半小时）
z=负载_z（）
#下面是标量卡尔曼滤波器的设置
#参考：http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html
#状态转移矩阵（我们只有一个标量）
#需要乘以上次的状态才能得到最新的状态
#我们得到x_t+1=A*x_t，因为我们直接得到用于模拟的x_t+1
#我们将有a=1
a=1.0
#观测矩阵
#您需要乘以状态的值，将其转换为与传入测量值相同的形式
#状态和测量值均为风速，因此设置h=1
h=1.0
Q=16.0#预测风速的预期过程方差
R=9.0#风速的预期测量方差
p_j=Q#过程协方差等于初始过程协方差估计
#卡尔曼增益等于k=hp-_j/（hp-_j+R）。完美的测量
#R=0，k减小为k=1/h，即1
k=1.0
#一周数据
#原始R2=0.183
#延迟=6时，R2=0.295
#延迟=12时，R2=0.147
#延迟=48时，R2=0.075
延迟=6
#卡尔曼环
对于t，枚举（x）中的x_样本：
如果t我更新了我的测试标量实现，而没有假设完美测量R为1，这就是将kalman增益降低为常量值1的原因。现在我看到时间序列的改进，RMSE误差减小
#! /usr/bin/python

import numpy as np
import pylab

import os

# RMSE improved
def main():

    # x = 336 data points of simulated wind speed for 7 days * 24 hour * 2 (every half an hour)
    # Imagine at time t, we will get a x_t fvalue or t+48 or a 24 hours later.
    x = load_x()

    # this is a list that will contain 336 data points of our corrected data
    x_sample_predict_list = []

    # z = 336 data points for 7 days * 24 hour * 2 of actual measured wind speed (every half an hour)
    z = load_z()

    # Here is the setup of the scalar kalman filter
    # reference: http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html
    # state transition matrix (we simply have a scalar)
    # what you need to multiply the last time's state to get the newest state
    # we get the x_t+1 = A * x_t, since we get the x_t+1 directly for simulation
    # we will have a = 1
    a = 1.0

    # observation matrix
    # what you need to multiply to the state, convert it to the same form as incoming measurement 
    # both state and measurements are wind speed, so set h = 1
    h = 1.0

    Q = 1.0     # expected process noise of predicted Wind Speed    
    R = 1.0     # expected measurement noise of Wind Speed

    p_j = Q # process covariance is equal to the initial process covariance estimate

    # Kalman gain is equal to k = hp-_j / (hp-_j + R).  With perfect measurement
    # R = 0, k reduces to k=1/h which is 1
    k = 1.0

    # one week data
    # original R2 = 0.183
    # with delay = 6, R2 = 0.295
    # with delay = 12, R2 = 0.147   
    # with delay = 48, R2 = 0.075
    delay = 6 

    # Kalman loop
    for t, x_sample in enumerate(x):

        if t <= delay:          
            # for the first day of the forecast,
            # we don't have forecast data and measurement 
            # from a day before to do correction
            x_sample_predict = x_sample             
        else: # t > 48
            # for a priori estimate we take x_sample as is
            # x_sample = x^-_j = a x^-_j_1 + b u_j
            # Inside the NWP (numerical weather prediction, 
            # the x_sample should be on x_sample_j-1 (assumption)

            x_sample_predict_prior = a * x_sample

            # we use the measurement from t-delay (ie. could be a day ago)
            # and forecast data from t-delay, to produce a leading residual that can be used to
            # correct the forecast.
            residual = z[t-delay] - h * x_sample_predict_list[t-delay]

            p_j_prior = a**2 * p_j + Q

            k = h * p_j_prior / (h**2 * p_j_prior + R)

            # we update our prediction based on the residual
            x_sample_predict = x_sample_predict_prior + k * residual

            p_j = p_j_prior * (1 - h * k)

            #print k
            #print p_j_prior
            #print p_j
            #raw_input()

        x_sample_predict_list.append(x_sample_predict)

    # initial goodness of fit
    R2_val_initial = calculate_regression(x,z)
    R2_string_initial = "R2 original: {0:10.3f}, ".format(R2_val_initial)   
    print R2_string_initial     # R2_val_original = 0.183

    original_RMSE = (((x-z)**2).mean())**0.5
    print "original_RMSE"
    print original_RMSE 
    print "\n"

    # final goodness of fit
    R2_val_final = calculate_regression(x_sample_predict_list,z)
    R2_string_final = "R2 final: {0:10.3f}, ".format(R2_val_final)  
    print R2_string_final       # R2_val_final = 0.267, which is better

    final_RMSE = (((x_sample_predict-z)**2).mean())**0.5
    print "final_RMSE"
    print final_RMSE    
    print "\n"


    timesteps = xrange(len(x))      
    pylab.plot(timesteps,x,'r-', timesteps,z,'b:', timesteps,x_sample_predict_list,'g--')
    pylab.xlabel('Time')
    pylab.ylabel('Wind Speed')
    pylab.title('Simulated Wind Speed vs Actual Wind Speed')
    pylab.legend(('predicted','measured','kalman'))
    pylab.show()


def calculate_regression(x, y):         
    R2 = 0  
    A = np.array( [x, np.ones(len(x))] )
    model, resid = np.linalg.lstsq(A.T, y)[:2]  
    R2_val = 1 - resid[0] / (y.size * y.var())          
    return R2_val

def load_x():
    return np.array([2, 3, 3, 5, 4, 4, 4, 5, 5, 6, 5, 7, 7, 7, 8, 8, 8, 9, 9, 10, 10, 10, 11, 11,
     11, 10, 8, 8, 8, 8, 6, 3, 4, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 6, 6, 7, 6, 8, 9, 10,
     12, 11, 10, 10, 10, 11, 11, 10, 8, 8, 9, 8, 9, 9, 9, 9, 8, 9, 8, 11, 11, 11, 12,
     12, 13, 13, 13, 13, 13, 13, 13, 14, 13, 13, 12, 13, 13, 12, 12, 13, 13, 12, 12, 
     11, 12, 12, 19, 18, 17, 15, 13, 14, 14, 14, 13, 12, 12, 12, 12, 11, 10, 10, 10, 
     10, 9, 9, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 7, 7, 8, 8, 8, 6, 5, 5, 
     5, 5, 5, 5, 6, 4, 4, 4, 6, 7, 8, 7, 7, 9, 10, 10, 9, 9, 8, 7, 5, 5, 5, 5, 5, 5, 
     5, 5, 6, 5, 5, 5, 4, 4, 6, 6, 7, 7, 7, 7, 6, 6, 5, 5, 4, 2, 2, 2, 1, 1, 1, 2, 3,
     13, 13, 12, 11, 10, 9, 10, 10, 8, 9, 8, 7, 5, 3, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6,
     7, 7, 7, 6, 6, 6, 7, 6, 6, 5, 4, 4, 3, 3, 3, 2, 2, 1, 5, 5, 3, 2, 1, 2, 6, 7, 
     7, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 
     7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 11, 11, 11, 11, 10, 10, 9, 10, 10, 10, 2, 2,
     2, 3, 1, 1, 3, 4, 5, 8, 9, 9, 9, 9, 8, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7,
     7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 7, 5, 5, 5, 5, 5, 6, 5])

def load_z():
    return np.array([3, 2, 1, 1, 1, 1, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 1, 2, 2, 2,
     2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 4, 4, 4, 5, 4, 4, 5, 5, 5, 6, 6,
     6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 8, 8, 8, 8, 8, 8, 9, 10, 9, 9, 10, 10, 9,
     9, 10, 9, 9, 10, 9, 8, 9, 9, 7, 7, 6, 7, 6, 6, 7, 7, 8, 8, 8, 8, 8, 8, 7, 6, 7,
     8, 8, 7, 8, 9, 9, 9, 9, 10, 9, 9, 9, 8, 8, 10, 9, 10, 10, 9, 9, 9, 10, 9, 8, 7, 
     7, 7, 7, 8, 7, 6, 5, 4, 3, 5, 3, 5, 4, 4, 4, 2, 4, 3, 2, 1, 1, 2, 1, 2, 1, 4, 4,
     4, 4, 4, 3, 3, 3, 1, 1, 1, 1, 2, 3, 3, 2, 3, 3, 3, 2, 2, 5, 4, 2, 5, 4, 1, 1, 1, 
     1, 1, 1, 1, 2, 2, 1, 1, 3, 3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 4, 4, 3, 3, 4, 4, 4, 4,
     4, 4, 5, 5, 5, 4, 3, 3, 3, 3, 3, 3, 3, 3, 1, 2, 2, 3, 3, 1, 2, 1, 1, 2, 4, 3, 1,
     1, 2, 0, 0, 0, 2, 1, 0, 0, 2, 3, 2, 4, 4, 3, 3, 4, 5, 5, 5, 4, 5, 4, 4, 4, 5, 5, 
     4, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4, 4, 5, 5, 5, 4, 5, 5, 5, 5, 6, 5, 5, 8, 9, 8, 9,
     9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 9, 10, 9, 8, 8, 9, 8, 9, 9, 10, 9, 9, 9,
     7, 7, 9, 8, 7, 6, 6, 5, 5, 5, 5, 3, 3, 3, 4, 6, 5, 5, 6, 5])

if __name__ == '__main__': main()  # this avoids executing main on import your_module

#/usr/bin/python
将numpy作为np导入
进口派拉布
导入操作系统
#RMSE改进
def main（）：
#x=336个7天*24小时*2（每半小时）模拟风速数据点
#想象一下，在时间t，我们将得到一个x_t f值或t+48或24小时后的值。
x=负载_x（）
#这是一个包含336个数据点的修正数据列表
x_样本_预测_列表=[]
#z=336个数据点，7天*24小时*2个实测风速（每半小时）
z=负载_z（）
#下面是标量卡尔曼滤波器的设置
#参考：http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html
#状态转移矩阵（我们只有一个标量）
#需要乘以上次的状态才能得到最新的状态
#我们得到x_t+1=A*x_t，因为我们直接得到用于模拟的x_t+1
#我们将有a=1
a=1.0
#观测矩阵
#您需要乘以状态的值，将其转换为与传入测量值相同的形式
#状态和测量值均为风速，因此设置h=1
h=1.0
Q=1.0#预测风速的预期过程噪声
R=1.0#风速的预期测量噪声
p_j=Q#过程协方差等于初始过程协方差估计
#卡尔曼增益等于k=hp-_j/（hp-_j+R）。完美的测量
#R=0，k减小为k=1/h，即1
k=1.0
#一周数据
#原始R2=0.183
#延迟=6时，R2=0.295
#延迟=12时，R2=0.147
#延迟=48时，R2=0.075
延迟=6
#卡尔曼环
对于t，枚举（x）中的x_样本：
如果t该线不符合以下条件：
在我看来，你应该这样做：
 residual = z[t -delay] - h * x_sample_predict_prior

这不是一个解决方案，所以我写它作为评论：时间序列分析可以很容易地在Matlab或R中完成。Python在这方面的开发可能不如它们。作为证据，Matlab和Python中有一个KF包：Matlab-R-如果您真的想使用Python，请尝试：或@Mai。谢谢Mai，我已经尝试了Kalman滤波器库的其他实现，比如在中找到的代码。但正是在应用于修正预测数据的问题上，这让我感到困惑，而我的简化代码就是这个概念问题的一个例证。我有一个更复杂的例子，使用上面链接中的完整卡尔曼滤波类，相同的输入数据（但状态向量是预测风速的三次多项式），但我也会看到校正数据的拟合优度下降。我不是滤波过程的专家，所以我不会试图解释观察结果2和3。但是我有一些数学和物理方面的知识。天气数据在概念上是连续的，但在采样上是离散的。如果我们说T=f（x），其中x是参数的向量
 residual = z[t -delay] - h * x_sample_predict_prior