Python 对数据帧使用逻辑或布尔索引的正确语法是什么？_Python_Pandas_Indexing

Python 对数据帧使用逻辑或布尔索引的正确语法是什么？

python pandas indexing

Python 对数据帧使用逻辑或布尔索引的正确语法是什么？,python,pandas,indexing,Python,Pandas,Indexing,我想使用逻辑索引来修改Pandas数据帧（版本0.15.2）中的值，如本文所述。我不断收到以下警告： A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-d

我想使用逻辑索引来修改Pandas数据帧（版本0.15.2）中的值，如本文所述。我不断收到以下警告：

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value

下面是一个要演示的示例

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[9,10]*6,
                   'B':range(23,35),
                   'C':range(-6,6)})

print df
     A   B  C
0    9  23 -6
1   10  24 -5
2    9  25 -4
3   10  26 -3
4    9  27 -2
5   10  28 -1
6    9  29  0
7   10  30  1
8    9  31  2
9   10  32  3
10   9  33  4
11  10  34  5

使用逻辑索引更改值的正确方法是什么？假设我想从

列中大于30的所有值中减去10，为什么不首选以下值？我意识到这是一项有限制的任务，我对此感到气馁。在我实际使用的代码中，它确实执行了我的预期（它不是制作副本，而是实际编辑原始数据帧），但它仍然显示警告：

df['B-type'] = 'B'                  # create column with dummy values
df['B-type'][df['B'] > 30] = 'BI'   # populate the column with real values for BI type
df['B-type'][df['B'] <= 30] = 'BII' # populate the column with real values for BII type
print df
     A   B  C B-type
0    9  23 -6    BII
1   10  24 -5    BII
2    9  25 -4    BII
3   10  26 -3    BII
4    9  27 -2    BII
5   10  28 -1    BII
6    9  29  0    BII
7   10  30  1    BII
8    9  31  2     BI
9   10  32  3     BI
10   9  33  4     BI
11  10  34  5     BI

df['B-type']='B'#使用伪值创建列
df['B-type'][df['B']>30]=“BI”#用BI-type的实值填充列
df['B-type'][df['B']一种方法是如下所示使用-
df.loc[df['B'] > 30,'B'] = df.loc[df['B'] > 30,'B'] - 10

演示-
In [9]: df = pd.DataFrame({'A':[9,10]*6,
   ...:                    'B':range(23,35),
   ...:                    'C':range(-6,6)})

In [10]:

In [10]: df
Out[10]:
     A   B  C
0    9  23 -6
1   10  24 -5
2    9  25 -4
3   10  26 -3
4    9  27 -2
5   10  28 -1
6    9  29  0
7   10  30  1
8    9  31  2
9   10  32  3
10   9  33  4
11  10  34  5

In [11]: df.loc[df['B'] > 30,'B'] = df.loc[df['B'] > 30,'B'] - 10

In [12]: df
Out[12]:
     A   B  C
0    9  23 -6
1   10  24 -5
2    9  25 -4
3   10  26 -3
4    9  27 -2
5   10  28 -1
6    9  29  0
7   10  30  1
8    9  21  2
9   10  22  3
10   9  23  4
11  10  24  5


或者，如注释中所述，您也可以使用上述的扩充分配版本-
df.loc[df['B'] > 30,'B'] -= 10

这种访问方式称为链式分配，应该避免，如中所述。它不能按预期工作的原因是更新了数据帧的副本而不是视图。这意味着原始数据帧未被修改
考虑这种链式分配：
df[df['B'] > 30]['B'] = -999

它相当于以下内容：
df_something = df[df['B'] > 30]
df_something['B'] = -999

>>> print df
     A   B  C
0    9  23 -6
1   10  24 -5
2    9  25 -4
3   10  26 -3
4    9  27 -2
5   10  28 -1
6    9  29  0
7   10  30  1
8    9  31  2
9   10  32  3
10   9  33  4
11  10  34  5

>>> print df_something
     A    B  C
8    9 -999  2
9   10 -999  3
10   9 -999  4
11  10 -999  5

可以看出，确实创建并更新了副本，这就是警告的内容。执行此类分配的正确方法是避免链接，即仅通过使用适当索引器的单个操作：
df.loc[df['B'] > 30, 'B'] = -999

请注意，这与同样是链式分配的df.loc[df['B']>30]['B']=-999不同。
我注意到警告出现时不一致。正如所指出的df[df['B']>30]['B']=…
显然复制了一份，但我检查了我有这个问题的代码，链式分配的顺序是相反的，仍然产生警告，df['B'][df['B']>30]=…
，但它没有复制，并且确实产生了预期的结果。我只是更新了问题以反映这一点。有没有理由不使用以下公式来简化表达式：df.loc[df['B']>30，'B']-=10
？我想不出来。