python pd.dataframe问题，索引13给出错误？_Python_Pandas

python pd.dataframe问题，索引13给出错误？

python pandas

python pd.dataframe问题，索引13给出错误？,python,pandas,Python,Pandas,正如你在下面看到的，我的proteinID数据框有4292个成员，当我试图打印它们时，我在索引13处得到一个错误，我不明白为什么知道发生了什么吗 print proteinID.shape print X_final.shape for i,prot in enumerate(X_final): print i print prot print proteinID[i] 这给了我： (4292L,) (4292L, 4L) 0 [ 0.01070217 0.8

正如你在下面看到的，我的proteinID数据框有4292个成员，当我试图打印它们时，我在索引13处得到一个错误，我不明白为什么

知道发生了什么吗

print proteinID.shape
print X_final.shape
for i,prot in enumerate(X_final):
    print i
    print prot
    print proteinID[i]

这给了我：

(4292L,)

(4292L, 4L)

0

[ 0.01070217  0.86624627  0.30031799  1.0022054 ]

Q9BV57

1

[ 0.14132098  0.5899623  -0.08037944  0.04028686]

Q04446

2

[ 0.14768145  0.37698604 -0.08798323 -0.71181829]

P61604

3

[ 0.23194252 -0.17301326 -0.20914528  0.27447231]

Q15029

4

[ 0.13608163  0.41788998  0.06103427 -0.1557695 ]

Q9NRX4

5

[ 0.11981057  0.62419406  0.085566    0.43029529]

P31946

6

[ 0.14734698  0.53942167  0.1647835   0.20525244]

P62258

7

[ 0.13301821  0.25249911  0.32216093  0.46965642]

Q04917

8

[ 0.30891193  0.35936887  0.14029331  0.22116058]

P61981

9

[ 0.15670011 -0.0317209   0.48168144  0.58226224]

P31947;REV__Q13315

10

[ 0.059664    0.52769527  0.09302036  0.28445371]

P27348

11

[ 0.22201161  0.703846    0.19846719  0.53470435]

P63104

12

[ 0.53312759  0.48972197 -0.15224852  0.16086491]

---------------------------------------------------------------------------

    KeyError                                  Traceback (most recent call last)

    <ipython-input-54-45a793f9a457> in <module>()
      4     print i

      5     print prot

      ----> 6     print proteinID[i]


    C:\Anaconda\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)

    507     def __getitem__(self, key):

    508         try:

      --> 509             result = self.index.get_value(self, key)

    510 

    511             if not np.isscalar(result):



    C:\Anaconda\lib\site-packages\pandas\core\index.pyc in get_value(self, series, 
    key)
       1415 

       1416         try:

    -> 1417             return self._engine.get_value(s, k)

       1418         except KeyError as e1:

       1419             if len(self) > 0 and self.inferred_type in 
    ['integer','boolean']:


pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3109)()



pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2840)()



pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3700)()



pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item 
(pandas\hashtable.c:7229)()


pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item 
(pandas\hashtable.c:7167)()


KeyError: 12L

我注意到在使用以下命令删除NaN值后：

#instead of imputing, we remove rows with nan values
valid_mask = [np.all(~np.isnan(x)) for x in data.values]
print data[valid_mask].shape
X_imputed = data[valid_mask].values
proteinID = proteinID[valid_mask]

索引是保留的，因此在这种情况下，缺少的索引过去是一个带有NaN值的行。

错误很明显，您没有第12列，请发布原始输入数据，并编写代码以再现错误感谢您的响应，但我们这里讨论的是行而不是列，我将用更多信息更新帖子！不，您不需要proteinID[i]这是尝试访问一列只有一列，因此。。

#instead of imputing, we remove rows with nan values
valid_mask = [np.all(~np.isnan(x)) for x in data.values]
print data[valid_mask].shape
X_imputed = data[valid_mask].values
proteinID = proteinID[valid_mask]