Python：迭代数据帧列，检查存储在数组中的条件值，并将值获取到列表中_Python_Arrays_Pandas_Dataframe

Python：迭代数据帧列，检查存储在数组中的条件值，并将值获取到列表中

python arrays pandas dataframe

Python：迭代数据帧列，检查存储在数组中的条件值，并将值获取到列表中,python,arrays,pandas,dataframe,Python,Arrays,Pandas,Dataframe,在论坛上得到一些帮助后，我成功地完成了我想要的事情，现在我需要进入下一个阶段。（详细解释如下： ): 我有一个数据框： In [3]: df Out[3]: index Num_Albums Num_authors 0 0 10 4 1 1 1 5 2 2 4 4 3 3 7 1000

在论坛上得到一些帮助后，我成功地完成了我想要的事情，现在我需要进入下一个阶段。（详细解释如下： ):

我有一个数据框：

In [3]: df
Out[3]: 
   index  Num_Albums  Num_authors
0      0          10            4
1      1           1            5
2      2           4            4
3      3           7         1000
4      4           1           44
5      5           3            8

我用另一列的累积和添加一列

In [4]: df['cumsum'] = df['Num_Albums'].cumsum()

In [5]: df
Out[5]: 
   index  Num_Albums  Num_authors  cumsum
0      0          10            4      10
1      1           1            5      11
2      2           4            4      15
3      3           7         1000      22
4      4           1           44      23
5      5           3            8      26

然后我将一个条件应用于

cumsum

列，并提取满足给定公差条件的行的相应值：

In [18]: tol = 2

In [19]: cond = df.where((df['cumsum']>=15-tol)&(df['cumsum']<=15+tol)).dropna()

In [20]: cond
Out[20]: 
   index  Num_Albums  Num_authors  cumsum
2    2.0         4.0          4.0    15.0

因此，对于上面的数据帧，我将得到（对于

tol=0

）：

我想要一个解决方案，让我保留

。如果可能，where

功能。

那么输出不总是1个数字，对吗？如果输出正好是1，您可以编写此代码

tol = 0
#condition
c = [5,15,25]
value = []

for i in c:
    if len(df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a']) > 0:
        value = value + [df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a'].values[0]]
    else:
        value = value + [[]]
print(value)

tol = 5
c = [5,15,25]
value = []

for i in c:
    getdatas = df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a'].values
    value.append([x for x in getdatas])
print(value)

tol=0
#状况
c=[5,15,25]
值=[]
对于c中的i：
如果len（df.where（（df['a']>=i-tol）和（df['a']0:
value=value+[df.where（（df['a']>=i-tol）&（df['a']=i-tol）&（df['a']]一个快速的方法是利用NumPy的广播技术作为同一链接帖子的扩展，尽管实际上有人问到了与使用df.where
相关的答案
广播消除了遍历数组中每个元素的需要，同时也非常高效
这篇文章的唯一补充是使用np.argmax
沿着每一列（遍历）获取第一个True
实例的索引↓ 方向）
检索到的切片：
slices
Out[692]:
array([0, 2, 4], dtype=int64)

生成的相应阵列：
num_albums[slices]
Out[693]:
array([10,  4,  1], dtype=int64)


如果您仍然喜欢使用DF.where
，下面是另一个使用列表理解的解决方案-
[df.where((df['cumsum'] >= cond - tol) & (df['cumsum'] <= cond + tol), -1)['Num_Albums']
   .max() for cond in conditions]
Out[695]:
[10, 4, 1]

[df.where（（df['cumsum']>=cond-tol）和（df['cumsum']我一直得到这样的结果：索引器：索引0超出了具有大小的轴0的界限0@Amaz第一个选项还是第二个选项？第一个选项将是索引器，因为它需要。值[0]，需要事先验证，让我为您编辑它。我实际上更喜欢第一个选项。我不确定“无”的用法是否清楚。应用您的建议，我得到的结果是，当条件不满足时，“切片”假定值为0。当我调用“num_albums[切片]”时，我得到第一个值（在索引0处）对于每个不满足条件的位置。当不满足条件时，我如何使“切片”为NaN？None
这里意味着np.newaxis
，简单地说，这意味着重新调整数组的形状，以便在数组中插入一个额外的维度，从而允许我们在这么多维度中查询数组（此处为二维数组）。出于同样的目的，num\u albums\u cumsum.reformate（-1,1）
也可以。不，num\u albums[slices]给了条件满足的值。如果你想让代码>南版<代码>出现在<代码> false <代码>条件下，那么我建议你考虑<代码> NP。在这里，代替。但是我不理解这里的含义，因为你只想在列表/数组中抓取它们。
slices
Out[692]:
array([0, 2, 4], dtype=int64)

num_albums[slices]
Out[693]:
array([10,  4,  1], dtype=int64)

[df.where((df['cumsum'] >= cond - tol) & (df['cumsum'] <= cond + tol), -1)['Num_Albums']
   .max() for cond in conditions]
Out[695]:
[10, 4, 1]