Python 2.7 将列表用作索引时出现越界错误

Python 2.7 将列表用作索引时出现越界错误,python-2.7,pandas,Python 2.7,Pandas,我有两个文件:一个是单列(称为pred),没有标题,另一个有两列:ID和IsClick(有标题)。我的目标是使用列ID作为pred的索引 import pandas as pd import numpy as np def LinesInFile(path): with open(path) as f: for linecount, line in enumerate(f): pass f.close() print 'Found

我有两个文件:一个是单列(称为pred),没有标题,另一个有两列:ID和IsClick(有标题)。我的目标是使用列ID作为pred的索引

import pandas as pd
import numpy as np

def LinesInFile(path):
    with open(path) as f:
        for linecount, line in enumerate(f):
            pass
    f.close()
    print 'Found ' + str(linecount) + ' lines' 
    return linecount

path ='/Users/mas/Documents/workspace/Avito/input/'                          # path to testing file
submission = path + 'submission1234.csv' 

lines = LinesInFile(submission)
lines = LinesInFile(path + 'sampleSubmission.csv')


sample = pd.read_csv(path + 'sampleSubmission.csv')
preds = np.array(pd.read_csv(submission, header = None))
index = sample.ID.values - 1
print index
print len(index)
sample['IsClick'] = preds[index]
sample.to_csv('submission.csv', index=False)
输出为:

Found 7816360 lines
Found 7816361 lines
[       0        4        5 ..., 15961507 15961508 15961511]
7816361
Traceback (most recent call last):
  File "/Users/mas/Documents/workspace/Avito/July3b.py", line 23, in <module>
    sample['IsClick'] = preds[index]
IndexError: index 7816362 is out of bounds for axis 0 with size 7816361
找到7816360行
找到7816361行
[       0        4        5 ..., 15961507 15961508 15961511]
7816361
回溯(最近一次呼叫最后一次):
文件“/Users/mas/Documents/workspace/Avito/July3b.py”,第23行,在
样本['IsClick']=preds[索引]
索引器:索引7816362超出大小为7816361的轴0的界限

似乎出现了一些问题,因为我的文件有7816361行计数标题,而我的列表有一个额外的元素(列表7816361的len)

我没有您的csv文件来重新创建该问题,但该问题似乎是由您使用
索引造成的

index=sample.ID.values-1
取每个样本ID并减去1。这些不是pred中的索引值,因为它只有7816360长。索引数组中最后3项(基于打印输出)中的每一项都会超出范围,因为它们>7816360。我怀疑错误是向您显示了超出边界的第一个
ID-1

假设您只想根据文件的行号加入文件,则可以执行以下操作:

sample=pd.concat((pd.read_csv(path + 'sampleSubmission.csv'),pd.read_csv(submission, header = None).rename(columns={0:'IsClick'})),axis=1)
否则,您将需要对两个数据帧执行联接或合并