Python 3.x 分组后数据帧索引的trapz集成_Python 3.x_Numpy_Scipy

Python 3.x 分组后数据帧索引的trapz集成

python-3.x numpy

Python 3.x 分组后数据帧索引的trapz集成,python-3.x,numpy,scipy,Python 3.x,Numpy,Scipy,我有一些数据，我想先按某个间隔对目标列进行分组，然后按索引间距对目标列进行积分 import numpy as np import pandas as pd from scipy import integrate df = pd.DataFrame({'A': np.array([100, 105.4, 108.3, 111.1, 113, 114.7, 120, 125, 129, 130, 131, 133,135,140, 141, 142]),

我有一些数据，我想先按某个间隔对目标列进行分组，然后按索引间距对目标列进行积分

import numpy as np
import pandas as pd
from scipy import integrate


df = pd.DataFrame({'A': np.array([100, 105.4, 108.3, 111.1, 113, 114.7, 120, 125, 129, 130, 131, 133,135,140, 141, 142]),
                   'B': np.array([11, 11.8, 12.3, 12.8, 13.1,13.6, 13.9, 14.4, 15, 15.1, 15.2, 15.3, 15.5, 16, 16.5, 17]),
                   'C': np.array([55, 56.3, 57, 58, 59.5, 60.4, 61, 61.5, 62, 62.1, 62.2, 62.3, 62.5, 63, 63.5, 64]),
                   'Target': np.array([4000, 4200.34, 4700, 5300, 5800, 6400, 6800, 7200, 7500, 7510, 7530, 7540, 7590,
                                      8000, 8200, 8300])})

df['y'] = df.groupby(pd.cut(df.iloc[:, 3], np.arange(0, max(df.iloc[:, 3]) + 100, 100))).sum().apply(lambda g: integrate.trapz(g.Target, x = g.index))

上面的代码给了我：

AttributeError: ("'Series' object has no attribute 'Target'", 'occurred at index A')

如果我尝试这样做：

colNames = ['A', 'B', 'C', 'Target']
df['z'] = df.groupby(pd.cut(df.iloc[:, 3], np.arange(0, max(df.iloc[:, 3]) + 100, 100))).sum().apply(lambda g: integrate.trapz(g[colNames[3]], x = g.index))

我得到：

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

....
KeyError: ('Target', 'occurred at index A')

您的代码中有几个问题：

您创建了一个数据帧副本，其分类索引比我想象的要多。
```
integrate.trapz
```
无法处理
使用“应用”，将integrate.trapz应用于每一行。这毫无意义。出于这个原因，我在评论中问您是否需要在每一行中使用从0到目标值的整数

如果您希望按“Target”列中的100个间隔从0转换数据，首先可以获得“Target”列从0到100的间隔总和

>>>i_df = df.groupby(pd.cut(df.iloc[:, 3], np.arange(0, max(df.iloc[:, 3]) + 100, 100))).sum()

然后得到“目标”列的梯形积分，间隔为100

>>>integrate.trapz(i_df['Target'], dx=100)
10242034.0

您不能使用x=i_df.index，因为没有为间隔定义（trapz中的内部）操作减法，并且您已经创建了间隔索引。如果需要使用数据帧索引，则必须将其重置

>>>i_df = df.groupby(pd.cut(df.iloc[:, 3], np.arange(0, max(df.iloc[:, 3]) + 100, 100))).sum().reset_index(drop=True)
>>>integrate.trapz(i_df['Target'], x=100*i_df.index)
10242034.0

因为序列只有一列，所以列名有点不相关。您可以使用

g.values

获取序列中的值，或

g.index.values

获取其索引的值。为了更清晰，您可以生成预期的输出吗？您是否尝试在每行中获取第一个间隔和当前间隔之间的积分？另一方面，您是按“Target”列进行分组，而不是按您所说的“A”进行分组。在应用之前，您的一部分数据是'ABC目标（0.01100.0]0.02000.0（100.02000.0]0.0200.0.0…（8100.08200.0]141.0166.563.58200.0]142.017.064.08300.0`