Python 基于缺少列名对数据帧进行切片时出错_Python_Pandas_Dataframe_Reindex

Python 基于缺少列名对数据帧进行切片时出错

python pandas dataframe

Python 基于缺少列名对数据帧进行切片时出错,python,pandas,dataframe,reindex,Python,Pandas,Dataframe,Reindex,我有一个包含多个索引和列的数据框架我想根据一些列名对这个数据帧进行切片，但有时给定的列名不在数据帧中。Pandas发出警告，使用.reindex而不是.loc，但我发现了奇怪的结果。为了澄清，让我们加载数据帧 import pandas as pd d2 = pd.read_csv('https://docs.google.com/uc?id=1Ufx6pvnSC6zQdTAj05ObmV027fA4-Mr3&export=download', index_col=[0,1]) d2.

我有一个包含多个索引和列的数据框架我想根据一些列名对这个数据帧进行切片，但有时给定的列名不在数据帧中。Pandas发出警告，使用

.reindex

而不是

.loc

，但我发现了奇怪的结果。为了澄清，让我们加载数据帧

import pandas as pd
d2 = pd.read_csv('https://docs.google.com/uc?id=1Ufx6pvnSC6zQdTAj05ObmV027fA4-Mr3&export=download', index_col=[0,1])
d2.head(3)

结果是：

..............................................
:          :      : ind475 : ind476 : ind456 :
:..........:......:........:........:........:
: Country  : Year :        :        :        :
: Argentin : 1966 :   6.15 :   7.77 : NaN    :
:          : 1967 :   8.33 :   9.81 : NaN    :
:          : 1968 :   9.19 :   10.2 : NaN    :
:..........:......:........:........:........:

如果我们使用现有列进行切片，则没有问题：

indicators_list = ['ind475', 'ind456']
idx = pd.IndexSlice
d3 = d2.loc[idx[:,:], idx[indicators_list]]
d3.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

Out>>（10006,2）

但是，如果我们使用一个或多个缺少的列进行切片，则会引发一个错误，但它是有效的

indicators_list = ['ind475', 'ind179']
d4 = d2.loc[idx[:,:], idx[indicators_list]]
d4.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

Out>>（2672,1） 红色警告

FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_nested_tuple(tup)

我尝试使用警告建议的reindex，如中所示，但结果是没有

indicators_list = ['ind475', 'ind179']
d5 = d2.reindex(columns=[indicators_list])
d5.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

输出：>>（0，0）

如何在没有警告或错误的情况下进行切片并获得正确的大小？

我相信您需要使用

isin

的筛选列名称（然后在必要时删除

NaN

s列）：

或：

如果使用多索引，请使用：

啊，我带着同样的东西来这里你的意思是如果多重索引也在列上，对吗？因为我的示例是行的多索引。是的，对于索引中的多索引，需要

df2[df2.columns.get\u level\u values（0.isin（indicators\u list）]）

在Python 3.5.2中使用

.reindex

运行代码时，它似乎工作正常。d5.dropna（…）的形状与预期的一样

（2672，1）

…@Snowbunting，我在Python3.6上运行它，通过anaconda使用所有最新的库

indicators_list = ['ind475', 'ind179']
print (df2.loc[:, df2.columns.isin(indicators_list)])

print (df2[df2.columns[df2.columns.isin(indicators_list)]])

print (df2.loc[:, df2.columns.get_level_values(0).isin(indicators_list)])