在python中，将函数应用于按每个功能列上的id和时间戳索引的数据帧_Python_Pandas_Multidimensional Array_Filter_Apply

在python中，将函数应用于按每个功能列上的id和时间戳索引的数据帧

python pandas filter

在python中，将函数应用于按每个功能列上的id和时间戳索引的数据帧,python,pandas,multidimensional-array,filter,apply,Python,Pandas,Multidimensional Array,Filter,Apply,大家好，我有一个5列的数据框： ID（整数）|时间（整数）|湿度|温度|压力 ID=房间时间=unixtimestamp秒湿度/温度/压力=传感器值我需要的我想通过ID对湿度/温度/压力执行一个过滤器（signal.lfilter）。。。例如对于ID=1 按时间asc排序的湿度值执行lfilter 按时间asc排序的温度值执行lfilter 在时间asc规定的压力值下执行lfilter 对于ID=2 按时间asc排序的湿度值执行lfilter 按时间asc排序的温度值执行lfilter

大家好，我有一个5列的数据框：

ID（整数）|时间（整数）|湿度|温度|压力

ID=房间
时间=unixtimestamp秒
湿度/温度/压力=传感器值

我需要的

我想通过ID对湿度/温度/压力执行一个过滤器（signal.lfilter）。。。例如

对于ID=1
按时间asc排序的湿度值执行lfilter
按时间asc排序的温度值执行lfilter
在时间asc规定的压力值下执行lfilter

对于ID=2
按时间asc排序的湿度值执行lfilter
按时间asc排序的温度值执行lfilter
在时间asc规定的压力值下执行lfilter

对于ID=n
按时间asc排序的湿度值执行lfilter
按时间asc排序的温度值执行lfilter
在时间asc规定的压力值下执行lfilter

我怎么能这么快？今天我使用2个for循环：

for i in df.id.unique():
    for column in ['humidity','temperature','pressure']:
        df[df.id=i][column] = ... lfilter ...

但是它太慢了，有什么帮助吗？

它不是超干净的，但是试试下面的方法。这是您使用

signal.lfilter

功能进行的操作吗

编辑：哎呀，忘了时间要求了。在执行下面的操作之前，只需运行

df.sort_值（['ID'，'TIME']，升序=True）

import pandas as pd
from scipy import signal
import numpy as np

np.random.seed(1618)

df = pd.DataFrame({'ID': [1,1,1,2,2,2], 
                   'humidity': np.random.random(6), 
                   'temperature': np.random.random(6), 
                   'pressure': np.random.random(6)})

#  >>> df
#     ID  humidity  pressure  temperature
#  0   1  0.605160  0.194984     0.450019
#  1   1  0.301108  0.077726     0.691227
#  2   1  0.197976  0.144978     0.155231
#  3   2  0.733884  0.458959     0.785704
#  4   2  0.457377  0.647681     0.092045
#  5   2  0.021497  0.417326     0.551941

tmp = df.groupby('ID').apply(lambda x: signal.lfilter(x['humidity'], x['pressure'], x['temperature']))
# this produces a vector for each ID.
# we have to unstack the vectors and append them to the original df

df['filtered']  = tmp.apply(lambda x: pd.Series(x)).stack().reset_index()[0]

# >>> df
#    ID  humidity  pressure  temperature  filtered
# 0   1  0.605160  0.194984     0.450019  1.396696
# 1   1  0.301108  0.077726     0.691227  2.283506
# 2   1  0.197976  0.144978     0.155231  0.057383
# 3   2  0.733884  0.458959     0.785704  1.256354
# 4   2  0.457377  0.647681     0.092045 -0.842783
# 5   2  0.021497  0.417326     0.551941  1.058038

你没有提供足够的信息来解决这个问题。根据

lfilter

是什么，我们可能能够进行一些真正快速的矢量化。但是你忘了告诉我们是什么。试着按照这里的建议去做