Python 数据帧每行在1秒内的最大变化_Python_Pandas_Dataframe

Python 数据帧每行在1秒内的最大变化

python pandas dataframe

Python 数据帧每行在1秒内的最大变化,python,pandas,dataframe,Python,Pandas,Dataframe,我对熊猫的计算有问题，我想知道是否有人能帮我使用以下代码创建此df： df = pd.DataFrame({'B': [0, 2, 1, np.nan, 4, 1, 3, 10, np.nan, 3, 6]}, index = [pd.Timestamp('20130101 09:31:23.999'), pd.Timestamp('20130101 09:31:24.200'),

我对熊猫的计算有问题，我想知道是否有人能帮我

使用以下代码创建此df：

df = pd.DataFrame({'B': [0, 2, 1, np.nan, 4, 1, 3, 10, np.nan, 3, 6]},
                  index = [pd.Timestamp('20130101 09:31:23.999'),
                           pd.Timestamp('20130101 09:31:24.200'),
                           pd.Timestamp('20130101 09:31:24.250'),
                           pd.Timestamp('20130101 09:31:25.000'),
                           pd.Timestamp('20130101 09:31:25.375'),
                           pd.Timestamp('20130101 09:31:25.850'),
                           pd.Timestamp('20130101 09:31:26.100'),
                           pd.Timestamp('20130101 09:31:27.150'),
                           pd.Timestamp('20130101 09:31:28.050'),
                           pd.Timestamp('20130101 09:31:28.850'),
                           pd.Timestamp('20130101 09:31:29.200')])

我希望能够计算每一行在一秒钟内B的最大变化量

例如，在第一行中，您必须查看它相对于第二行和第三行的变化程度，这两行在一秒钟的时间间隔内，并计算与最大值的差异

在这种情况下，最大值位于第二行09:31:24.200，最大变化为2-0

然后，我们将为每一行创建一个包含所有这些最大变化的新列

df

|                         | B    | Maximum Variation  |
|-------------------------|------|--------------------|
| 2013-01-01 09:31:23.999 | 0.0  | 2.0                |
| 2013-01-01 09:31:24.200 | 2.0  | 1.0                |
| 2013-01-01 09:31:24.250 | 1.0  | 0.0                |
| 2013-01-01 09:31:25.000 | NaN  | 4.0                |
| 2013-01-01 09:31:25.375 | 4.0  |-3.0                |
| 2013-01-01 09:31:25.850 | 1.0  | 2.0                |
| 2013-01-01 09:31:26.100 | 3.0  | 0.0                |
| 2013-01-01 09:31:27.150 | 10.0 | 0.0                |
| 2013-01-01 09:31:28.050 | NaN  | 3.0                |
| 2013-01-01 09:31:28.850 | 3.0  | 3.0                |
| 2013-01-01 09:31:29.200 | 6.0  | 0.0                |

我希望足够清楚

答案中已经找到并分享了解决方案，但该解决方案的效率改进（不需要为df的每一行创建循环）将非常受欢迎。

我终于找到了解决方案：

df = pd.DataFrame({'B': [0, 1, 2, 8, 6, 1, 3, 10, np.nan, 3, 6]},
                  index = [pd.Timestamp('20130101 09:31:23.999'),
                           pd.Timestamp('20130101 09:31:24.200'),
                           pd.Timestamp('20130101 09:31:24.250'),
                           pd.Timestamp('20130101 09:31:25.000'),
                           pd.Timestamp('20130101 09:31:25.375'),
                           pd.Timestamp('20130101 09:31:25.850'),
                           pd.Timestamp('20130101 09:31:26.100'),
                           pd.Timestamp('20130101 09:31:27.150'),
                           pd.Timestamp('20130101 09:31:28.050'),
                           pd.Timestamp('20130101 09:31:28.850'),
                           pd.Timestamp('20130101 09:31:29.200')])

df = df.reset_index()

df = df.rename(columns={"index": "start_date"})

df['duration_in_seconds'] = 1

df['end_date'] = df['start_date'] + pd.to_timedelta(df['duration_in_seconds'], unit='s')

df['max'] = np.nan

for index, row in df.iterrows():
    start = row['start_date']
    end = row['end_date']
    maxi = df[(df['start_date'] >= start ) & (df['start_date'] <= end)]['B'].max()
    df.iloc[index, df.columns.get_loc('max')] = maxi

df['Maximum Variation'] = df['max'] - df['B']

df

|    | start_date              | B    | duration_in_seconds | end_date                | max  | Maximum Variation |
|----|-------------------------|------|---------------------|-------------------------|------|-------------------|
| 0  | 2013-01-01 09:31:23.999 | 0.0  | 1                   | 2013-01-01 09:31:24.999 | 2.0  | 2.0               |
| 1  | 2013-01-01 09:31:24.200 | 1.0  | 1                   | 2013-01-01 09:31:25.200 | 8.0  | 7.0               |
| 2  | 2013-01-01 09:31:24.250 | 2.0  | 1                   | 2013-01-01 09:31:25.250 | 8.0  | 6.0               |
| 3  | 2013-01-01 09:31:25.000 | 8.0  | 1                   | 2013-01-01 09:31:26.000 | 8.0  | 0.0               |
| 4  | 2013-01-01 09:31:25.375 | 6.0  | 1                   | 2013-01-01 09:31:26.375 | 6.0  | 0.0               |
| 5  | 2013-01-01 09:31:25.850 | 1.0  | 1                   | 2013-01-01 09:31:26.850 | 3.0  | 2.0               |
| 6  | 2013-01-01 09:31:26.100 | 3.0  | 1                   | 2013-01-01 09:31:27.100 | 3.0  | 0.0               |
| 7  | 2013-01-01 09:31:27.150 | 10.0 | 1                   | 2013-01-01 09:31:28.150 | 10.0 | 0.0               |
| 8  | 2013-01-01 09:31:28.050 | NaN  | 1                   | 2013-01-01 09:31:29.050 | 3.0  | NaN               |
| 9  | 2013-01-01 09:31:28.850 | 3.0  | 1                   | 2013-01-01 09:31:29.850 | 6.0  | 3.0               |
| 10 | 2013-01-01 09:31:29.200 | 6.0  | 1                   | 2013-01-01 09:31:30.200 | 6.0  | 0.0               |

更省时的解决方案仍然受欢迎

更高效的解决方案

df = df.reset_index()

df = df.rename(columns={"index": "start_date"})

df['duration_in_seconds'] = 1

df['end_date'] = df['start_date'] + pd.to_timedelta(df['duration_in_seconds'], unit='s')

df['max'] = np.nan

df["max"] = df.apply(lambda row : df.loc[(df["start_date"] >= row['start_date']) & (df["start_date"] <=row['end_date'])]["B"].max(), axis = 1)

df['Maximum Variation'] = df['max'] - df['B']

为什么2013-01-01 09:31:24.250的最大变量是0而不是-1？在它的1秒内只有一个值，即2013-01-01 09:31:25.000，其中有一个B为nan假设nan为零，差值将为-1您想在这里每一秒间隔的最大差值，如2 09:31:24值？还是您对时间增量感兴趣，从09:31:24.250开始1秒或0.5秒？如果我们有nan值，我们不认为它是零，我们认为，如果在第二个24.600中有1的值，在第二个25.200中有5的值，我们就不知道它会发生什么。用你的解决方案，我们不会计算这个差异。我们需要计算每一行在整整一秒钟内的最大差值。然后，对于这一行，我们需要考虑第二个24.600中的第一个值和第二个25.600的最大值之间的最大差值。

df = pd.DataFrame({'B': [0, 1, 2, 8, 6, 1, 3, 10, np.nan, 3, 6]},
                  index = [pd.Timestamp('20130101 09:31:23.999'),
                           pd.Timestamp('20130101 09:31:24.200'),
                           pd.Timestamp('20130101 09:31:24.250'),
                           pd.Timestamp('20130101 09:31:25.000'),
                           pd.Timestamp('20130101 09:31:25.375'),
                           pd.Timestamp('20130101 09:31:25.850'),
                           pd.Timestamp('20130101 09:31:26.100'),
                           pd.Timestamp('20130101 09:31:27.150'),
                           pd.Timestamp('20130101 09:31:28.050'),
                           pd.Timestamp('20130101 09:31:28.850'),
                           pd.Timestamp('20130101 09:31:29.200')])

df = df.reset_index()

df = df.rename(columns={"index": "start_date"})

df['duration_in_seconds'] = 1

df['end_date'] = df['start_date'] + pd.to_timedelta(df['duration_in_seconds'], unit='s')

df['max'] = np.nan

for index, row in df.iterrows():
    start = row['start_date']
    end = row['end_date']
    maxi = df[(df['start_date'] >= start ) & (df['start_date'] <= end)]['B'].max()
    df.iloc[index, df.columns.get_loc('max')] = maxi

df['Maximum Variation'] = df['max'] - df['B']

df

|    | start_date              | B    | duration_in_seconds | end_date                | max  | Maximum Variation |
|----|-------------------------|------|---------------------|-------------------------|------|-------------------|
| 0  | 2013-01-01 09:31:23.999 | 0.0  | 1                   | 2013-01-01 09:31:24.999 | 2.0  | 2.0               |
| 1  | 2013-01-01 09:31:24.200 | 1.0  | 1                   | 2013-01-01 09:31:25.200 | 8.0  | 7.0               |
| 2  | 2013-01-01 09:31:24.250 | 2.0  | 1                   | 2013-01-01 09:31:25.250 | 8.0  | 6.0               |
| 3  | 2013-01-01 09:31:25.000 | 8.0  | 1                   | 2013-01-01 09:31:26.000 | 8.0  | 0.0               |
| 4  | 2013-01-01 09:31:25.375 | 6.0  | 1                   | 2013-01-01 09:31:26.375 | 6.0  | 0.0               |
| 5  | 2013-01-01 09:31:25.850 | 1.0  | 1                   | 2013-01-01 09:31:26.850 | 3.0  | 2.0               |
| 6  | 2013-01-01 09:31:26.100 | 3.0  | 1                   | 2013-01-01 09:31:27.100 | 3.0  | 0.0               |
| 7  | 2013-01-01 09:31:27.150 | 10.0 | 1                   | 2013-01-01 09:31:28.150 | 10.0 | 0.0               |
| 8  | 2013-01-01 09:31:28.050 | NaN  | 1                   | 2013-01-01 09:31:29.050 | 3.0  | NaN               |
| 9  | 2013-01-01 09:31:28.850 | 3.0  | 1                   | 2013-01-01 09:31:29.850 | 6.0  | 3.0               |
| 10 | 2013-01-01 09:31:29.200 | 6.0  | 1                   | 2013-01-01 09:31:30.200 | 6.0  | 0.0               |

df = df.reset_index()

df = df.rename(columns={"index": "start_date"})

df['duration_in_seconds'] = 1

df['end_date'] = df['start_date'] + pd.to_timedelta(df['duration_in_seconds'], unit='s')

df['max'] = np.nan

df["max"] = df.apply(lambda row : df.loc[(df["start_date"] >= row['start_date']) & (df["start_date"] <=row['end_date'])]["B"].max(), axis = 1)

df['Maximum Variation'] = df['max'] - df['B']