Python Pandas：获取if语句/.loc以返回该行的索引_Python_If Statement_Pandas_Dataframe

Python Pandas：获取if语句/.loc以返回该行的索引

python if-statement pandas dataframe

Python Pandas：获取if语句/.loc以返回该行的索引,python,if-statement,pandas,dataframe,Python,If Statement,Pandas,Dataframe,我有一个有两列的数据框，我正在添加第三列我希望第三列取决于第二列的值，即返回一个集合答案或该行的相应索引数据库的示例如下： print (df) Amount Percentage Country Belgium 20 .0952 France 50 .2380 Germany 60 .2857 UK 80 .380

我有一个有两列的数据框，我正在添加第三列

我希望第三列取决于第二列的值，即返回一个集合答案或该行的相应索引

数据库的示例如下：

print (df)
            Amount      Percentage
Country      
Belgium      20           .0952
France       50           .2380
Germany      60           .2857
UK           80           .3809

现在，我希望我新的第三栏在百分比低于25%时说“其他”，在百分比高于25%时说国家名称。这就是我写的：

df.['Country']='Other')
df.loc[df['percentage']>0.25, 'Country']=df.index

不幸的是，我的输出没有给出等价的索引；它只是按顺序给出索引：

 print (df)
            Amount      Percentage      Country
Country      
Belgium      20           .0952         Other
France       50           .2380         Other
Germany      60           .2857         Belgium
UK           80           .3809         France

显然，我想看到德国和英国的对峙。我怎样才能让它给我一个索引，该索引与代码中超出阈值的数字位于同一行？

您可以尝试：

或通过以下方式从索引创建系列：

要使用您尝试实现的方法，请执行以下操作：

df['Country'] = 'Other'
df.loc[df['Percentage'] > 0.25, 'Country'] = df.loc[df['Percentage'] > 0.25].index

>>> df
         Amount  Percentage  Country
Country                             
Belgium      20      0.0952    Other
France       50      0.2380    Other
Germany      60      0.2857  Germany
UK           80      0.3809       UK

由于过滤器两侧相同，因此通常最好在大型数据集上使用掩码，以便只进行一次比较：

mask = df['Percentage'] > 0.25
df.loc[mask, 'Country'] = df.loc[mask].index

# Delete the mask once finished with it to save memory if needed.
del mask

这可以工作，但对于低于阈值的所有结果，都会返回

NaN

。在您的代码中，我在哪里指定这些应该被称为“Other”？在第一行。首先，我将整个列设置为

Other

，然后使用

loc

将那些高于阈值的列设置为

Country

索引。

numpy。在哪里

方法给出了我要查找的结果？你能告诉我为什么该方法有效，而我原来的方法无效吗？他们都指定df.index作为输出？这是一个很难回答的问题-我第一次感到惊讶的是df.loc[df['Percentage']>0.25，'Country']=df.index。如果将索引转换为

系列

，它将作为我的第二个解决方案。也许是虫子。

df['Country'] = 'Other'
df.loc[df['Percentage'] > 0.25, 'Country'] = df.loc[df['Percentage'] > 0.25].index

>>> df
         Amount  Percentage  Country
Country                             
Belgium      20      0.0952    Other
France       50      0.2380    Other
Germany      60      0.2857  Germany
UK           80      0.3809       UK

mask = df['Percentage'] > 0.25
df.loc[mask, 'Country'] = df.loc[mask].index

# Delete the mask once finished with it to save memory if needed.
del mask