Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/elixir/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么滚动应用如此缓慢?_Python_Pandas_Rolling Computation - Fatal编程技术网

Python 为什么滚动应用如此缓慢?

Python 为什么滚动应用如此缓慢?,python,pandas,rolling-computation,Python,Pandas,Rolling Computation,我有一个大数据帧,100000000*50(大约4G) 我想这样计算滚动窗口的加权平均值: #df shape is (100,000,000 * 50) from functools import partial window_size=[1,2,3,4,5,6] for i in window_size: df['triangle_mv_%d'%(i)] = df['mid'].diff(1).rolling(i).apply(partial(np.average, weights=

我有一个大数据帧,100000000*50(大约4G)

我想这样计算滚动窗口的加权平均值:

#df shape is (100,000,000 * 50)
from functools import partial
window_size=[1,2,3,4,5,6]
for i in window_size:
    df['triangle_mv_%d'%(i)] = df['mid'].diff(1).rolling(i).apply(partial(np.average, weights=range(i)))
import pandas as pd
import numpy as np
from pandas.core.window.rolling import _flex_binary_moment, _Rolling_and_Expanding

def weighted_mean(self, weights, **kwargs):
    weights = self._shallow_copy(weights)
    window = self._get_window(weights)

    def _get_weighted_mean(X, Y): 
        X = X.astype('float64')
        Y = Y.astype('float64')
        sum_f = lambda x: x.rolling(window, self.min_periods, center=self.center).sum(**kwargs)
        print(X)
        print(Y)
        return sum_f(X * Y) / sum_f(Y)

    return _flex_binary_moment(self._selected_obj, weights._selected_obj,
                               _get_weighted_mean, pairwise=True)

_Rolling_and_Expanding.weighted_mean = weighted_mean

df = pd.DataFrame(np.reshape(range(25), (5,5)))

print(df[1].rolling(2).weighted_mean(pd.Series([1,2])))  # this is wrong, expected result should have 4 values, but there is only one valid values in output like this [NAN, 4.333, NAN, NAN, NAN]
我发现它相当慢,一个循环,花费超过15分钟

我不能理解这一点,因为滚动(我的意思是)相当快,我只是叫应用加权平均值,怎么会这么慢

我也学了很多,一些裁判告诉我重写加权平均值函数,以便像这样滚动:

#df shape is (100,000,000 * 50)
from functools import partial
window_size=[1,2,3,4,5,6]
for i in window_size:
    df['triangle_mv_%d'%(i)] = df['mid'].diff(1).rolling(i).apply(partial(np.average, weights=range(i)))
import pandas as pd
import numpy as np
from pandas.core.window.rolling import _flex_binary_moment, _Rolling_and_Expanding

def weighted_mean(self, weights, **kwargs):
    weights = self._shallow_copy(weights)
    window = self._get_window(weights)

    def _get_weighted_mean(X, Y): 
        X = X.astype('float64')
        Y = Y.astype('float64')
        sum_f = lambda x: x.rolling(window, self.min_periods, center=self.center).sum(**kwargs)
        print(X)
        print(Y)
        return sum_f(X * Y) / sum_f(Y)

    return _flex_binary_moment(self._selected_obj, weights._selected_obj,
                               _get_weighted_mean, pairwise=True)

_Rolling_and_Expanding.weighted_mean = weighted_mean

df = pd.DataFrame(np.reshape(range(25), (5,5)))

print(df[1].rolling(2).weighted_mean(pd.Series([1,2])))  # this is wrong, expected result should have 4 values, but there is only one valid values in output like this [NAN, 4.333, NAN, NAN, NAN]

有人能帮忙吗?如何快速实现此功能?为什么apply方法这么慢?

apply实际上只是一个方便的函数。。。它基本上和汽车一样慢loop@JoranBeasley. 好的。。。。我认为它应该足够有效,但rolling.mean相当快,这让我很困惑。读熊猫的源代码对我来说太难了