Python 数据帧：计算特定索引元素/位置之间的度量_Python_Pandas

Python 数据帧：计算特定索引元素/位置之间的度量

python pandas

Python 数据帧：计算特定索引元素/位置之间的度量,python,pandas,Python,Pandas,我有一个长度为N的数据帧，在任意距离内有某些索引/位置ni。现在我想计算两个连续索引元素ni和ni+1之间的度量例如： import numpy as np import pandas as pd df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD')) df['id'] = ['W', 'W', 'W', 'Z', 'Z', 'Y', 'Y', 'Y', 'Z', 'Z'] print(df)

我有一个长度为

的数据帧，在任意距离内有某些索引/位置

ni

。现在我想计算两个连续索引元素

ni

和

ni+1

之间的度量

例如：

import numpy as np
import pandas as pd


df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['id'] = ['W', 'W', 'W', 'Z', 'Z', 'Y', 'Y', 'Y', 'Z', 'Z']

print(df)

          A         B         C         D id
0  0.347501 -1.152416  1.441144 -0.144545  w
1  0.775828 -1.176764  0.203049 -0.305332  w
2  1.036246 -0.467927  0.088138 -0.438207  w
3 -0.737092 -0.231706  0.268403  0.464026  x
4 -1.857346 -1.420284 -0.515517 -0.231774  x
5 -0.970731  0.217890  0.193814 -0.078838  y
6 -0.318314 -0.244348  0.162103  1.204386  y
7  0.340199  1.074977  1.201068 -0.431473  y
8  0.202050  0.790434  0.643458 -0.068620  z
9 -0.882865  0.687325 -0.008771 -0.066912  z

现在让我们假设我有

n1=0

，

n2=4

，

n3=5

，

n4=9

，并想计算A列和B列之间的算术平均值，例如平均值（

n1

，

n2

），平均值（

n2

，

n3

），平均值（

n3

，

n4

），平均值（

n4

，

）
预期的输出将是一个具有4行（平均值）和两列（a和B）的数据帧
欢迎任何提示
提前谢谢
 用于切片：
In [11]: n1=0; n2=4; n3=5; n4=9

In [12]: df.loc[n1:n2, "A"]
Out[12]:
0    0.347501
1    0.775828
2    1.036246
3   -0.737092
4   -1.857346
Name: A, dtype: float64

In [13]: df.loc[n3:n4, "B"]
Out[13]:
5    0.217890
6   -0.244348
7    1.074977
8    0.790434
9    0.687325
Name: B, dtype: float64

In [14]: df.loc[n1:n2, "A"].mean()
Out[14]: -0.086972599999999956

In [15]: df.loc[n3:n4, "B"].mean()
Out[15]: 0.50525560000000003

用于切片：
In [11]: n1=0; n2=4; n3=5; n4=9

In [12]: df.loc[n1:n2, "A"]
Out[12]:
0    0.347501
1    0.775828
2    1.036246
3   -0.737092
4   -1.857346
Name: A, dtype: float64

In [13]: df.loc[n3:n4, "B"]
Out[13]:
5    0.217890
6   -0.244348
7    1.074977
8    0.790434
9    0.687325
Name: B, dtype: float64

In [14]: df.loc[n1:n2, "A"].mean()
Out[14]: -0.086972599999999956

In [15]: df.loc[n3:n4, "B"].mean()
Out[15]: 0.50525560000000003

你是在找pd.concat公司吗
l = [n1,n2,n3,n4]
newl = list(zip(l,l[1:]))
# [(0, 4), (4, 5), (5, 9)]
pd.concat([df.loc[i[0]:i[1],['A','B']].mean() for i in newl])

输出：
A   -0.044437
B    0.295627
A   -0.884344
B   -0.005827
A    0.451703
B    0.077761
dtype: float64
A B
0 -0.044437  0.295627
1 -0.884344 -0.005827
2  0.451703  0.077761
你是在找pd.concat公司吗
l = [n1,n2,n3,n4]
newl = list(zip(l,l[1:]))
# [(0, 4), (4, 5), (5, 9)]
pd.concat([df.loc[i[0]:i[1],['A','B']].mean() for i in newl])

输出：
A   -0.044437
B    0.295627
A   -0.884344
B   -0.005827
A    0.451703
B    0.077761
dtype: float64
A B
0 -0.044437  0.295627
1 -0.884344 -0.005827
2  0.451703  0.077761
使用.iloc
n1=0
n2=4
n3=5
n4=9

df
Out[22]: 
          A         B         C         D id
0 -0.238283  0.109911  0.351710  0.048457  W
1 -0.325829  0.017999 -0.965771 -0.860846  W
2 -1.095183 -0.448895  1.690735  0.140668  W
3 -0.016087  1.025236  1.634730  0.755837  Z
4 -1.394894  0.343395 -0.522272  0.308791  Z
5  0.308004 -2.243848  0.359605 -0.806157  Y
6 -0.149900  0.305214 -2.250844  0.385339  Y
7 -0.562943 -0.651464  1.241993 -0.963086  Y
8 -0.465702  1.429940 -0.146888  0.436931  Z
9 -0.766442  0.899470  0.210917 -0.751582  Z

df.iloc[n1:n2]
Out[23]: 
          A         B         C         D id
0 -0.238283  0.109911  0.351710  0.048457  W
1 -0.325829  0.017999 -0.965771 -0.860846  W
2 -1.095183 -0.448895  1.690735  0.140668  W
3 -0.016087  1.025236  1.634730  0.755837  Z


#The Mean for each Column within your index range  
df.iloc[n1:n2].mean()
Out[24]: 
A   -0.418846
B    0.176063
C    0.677851
D    0.021029
dtype: float64

#The Mean for each Row within your index range
df.iloc[n1:n2].mean(axis=1)
Out[25]: 
0    0.067949
1   -0.533612
2    0.071831
3    0.849929
dtype: float64

#To get the mean for a specific Column
df["A"].iloc[n1:n2].mean()
Out[31]: -0.4188455553382261

我希望以上回答您的问题。
使用.iloc
n1=0
n2=4
n3=5
n4=9

df
Out[22]: 
          A         B         C         D id
0 -0.238283  0.109911  0.351710  0.048457  W
1 -0.325829  0.017999 -0.965771 -0.860846  W
2 -1.095183 -0.448895  1.690735  0.140668  W
3 -0.016087  1.025236  1.634730  0.755837  Z
4 -1.394894  0.343395 -0.522272  0.308791  Z
5  0.308004 -2.243848  0.359605 -0.806157  Y
6 -0.149900  0.305214 -2.250844  0.385339  Y
7 -0.562943 -0.651464  1.241993 -0.963086  Y
8 -0.465702  1.429940 -0.146888  0.436931  Z
9 -0.766442  0.899470  0.210917 -0.751582  Z

df.iloc[n1:n2]
Out[23]: 
          A         B         C         D id
0 -0.238283  0.109911  0.351710  0.048457  W
1 -0.325829  0.017999 -0.965771 -0.860846  W
2 -1.095183 -0.448895  1.690735  0.140668  W
3 -0.016087  1.025236  1.634730  0.755837  Z


#The Mean for each Column within your index range  
df.iloc[n1:n2].mean()
Out[24]: 
A   -0.418846
B    0.176063
C    0.677851
D    0.021029
dtype: float64

#The Mean for each Row within your index range
df.iloc[n1:n2].mean(axis=1)
Out[25]: 
0    0.067949
1   -0.533612
2    0.071831
3    0.849929
dtype: float64

#To get the mean for a specific Column
df["A"].iloc[n1:n2].mean()
Out[31]: -0.4188455553382261

我希望以上回答您的问题。
预期输出是什么样子的？我已经修改了我的答案！Ohk什么n5
now@Bharathpd.cut
，是您显示的time@Wen不知道我是否可以在这里使用pd.cut
。也许有列表理解的concat预期输出是什么样子的？我已经修改了我的答案！Ohk什么n5
now@Bharathpd.cut
，是您显示的time@Wen不知道我是否可以在这里使用pd.cut
。可能需要列表理解。我认为没有必要使用+1
。谢谢。但是，如果我的索引向量有10.000个元素（n1…n1000），难道不可能在一次操作中以某种方式传递它吗？例如groupby或类似的操作？@Bharath如果你想在计算中包括第n2/n4行，它就是。@cordkaldemyer是n
一个numpy数组，你是如何得到n1，n2，n3。。。一千元@我认为+1对于iloc
而不是loc
可能是必要的。因为为n2=4
选择5
？我认为+1
没有必要。谢谢。但是，如果我的索引向量有10.000个元素（n1…n1000），难道不可能在一次操作中以某种方式传递它吗？例如groupby或类似的操作？@Bharath如果你想在计算中包括第n2/n4行，它就是。@cordkaldemyer是n
一个numpy数组，你是如何得到n1，n2，n3。。。一千元@我认为+1对于iloc
而不是loc
可能是必要的。因为为n2=4
选择5
？只意味着['A'，'B']，我认为这与@Andy发布的内容没有什么不同。使用loc比iloc@Bharath谢谢你的评论。你能解释一下为什么loc比iloc好吗？我对熊猫还不熟悉。它的意思只是['A'，'B'，我认为这和@Andy发布的内容没有什么不同。使用loc比iloc@Bharath谢谢你的评论。你能解释一下为什么loc比iloc好吗？我是新来的熊猫。非常感谢。这个解决方案非常简短、通用，可以解决任何长度的传递索引！也谢谢所有其他的答案。非常感谢。这个解决方案非常简短、通用，可以解决任何长度的传递索引！也谢谢所有其他的答案。