Python 在数据帧上迭代并一次选择n个行和列
因此,我有一个数据集,如下所示:Python 在数据帧上迭代并一次选择n个行和列,python,pandas,Python,Pandas,因此,我有一个数据集,如下所示: # Example 0 1 2 3 4 5 0 18 1 -19 -16 -5 19 1 18 0 -19 -17 -6 19 2 17 -1 -20 -17 -6 19 3 18 1 -19 -16 -5 20 4 18 0 -19 -16 -5 20 实际数据: [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19}, {0: 18,
# Example
0 1 2 3 4 5
0 18 1 -19 -16 -5 19
1 18 0 -19 -17 -6 19
2 17 -1 -20 -17 -6 19
3 18 1 -19 -16 -5 20
4 18 0 -19 -16 -5 20
实际数据:
[{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
{0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
{0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
{0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
{0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
{0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
上面的形状是:(20,6)
我想要实现的是对4行上的每一列同时应用一个自定义函数
例如:
f()
应用于所有列的df.ix[0:3]
李>
f()
应用于所有列的df.ix[4:7]
李>
等等
在某种程度上,我需要的是用步幅4滚动4号窗口。
使用上述数据时,结果将是形状为的数据框:(5,6)
。为了便于讨论,您可以假设自定义函数为每列取这4行的平均值
到目前为止我试过什么
curr = 0
res = []
while curr < df_to_look_at2.shape[0]:
look_at = df_to_look_at2.ix[curr:curr+3]
curr += 4
res.append(look_at.mean().values.tolist())
pd.DataFrame(res)
一个额外的想法,如果它不采用Mead,而是Min(),Max(),(和)其他一些自定义函数…
滚动,如果你想在一个以上的窗口中不止一次地考虑行,那么这里是精确的。然而,你的窗口是独一无二的,所以你真正想问的是如何根据你的步幅进行分组,你可以使用
arange
和floor division进行分组
window_size = 4
grouper = np.arange(df.shape[0]) // window_size
df.groupby(grouper).mean()
如果你想在一个以上的窗口中不止一次地考虑一行,那么滚动将是精确的。然而,你的窗口是独一无二的,所以你真正想问的是如何根据你的步幅进行分组,你可以使用
arange
和floor division进行分组
window_size = 4
grouper = np.arange(df.shape[0]) // window_size
df.groupby(grouper).mean()
我认为以这种方式进行的多次计算确实属于numpy的地盘。可以使用重塑以所需格式获取基础数组,并根据需要对数组进行计算
inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
{0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
{0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
{0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
{0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
{0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
import pandas as pd
df = pd.DataFrame(inp)
temp = df.values.reshape(-1, 4, df.shape[-1])
out = pd.DataFrame(temp.mean(axis=1))
输出:
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
我认为以这种方式进行的多次计算确实属于numpy的地盘。可以使用重塑以所需格式获取基础数组,并根据需要对数组进行计算
inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
{0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
{0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
{0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
{0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
{0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
{0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
{0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
{0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
import pandas as pd
df = pd.DataFrame(inp)
temp = df.values.reshape(-1, 4, df.shape[-1])
out = pd.DataFrame(temp.mean(axis=1))
输出:
0 1 2 3 4 5
0 17.75 0.25 -19.25 -16.50 -5.50 19.25
1 18.25 0.25 -19.00 -16.00 -5.25 19.50
2 17.75 0.25 -19.25 -16.75 -5.75 19.00
3 17.75 0.25 -19.00 -16.00 -4.75 19.75
4 17.75 0.25 -18.75 -14.75 -3.75 21.00
看起来您可能只想按每四个值分组。可能
df.groupby(np.arange(df.shape[0])//4)
?看起来您可能只想按每四个值进行分组。也许df.groupby(np.arange(df.shape[0])//4)
?