Python 2.7 将列表用作索引时出现越界错误
我有两个文件:一个是单列(称为pred),没有标题,另一个有两列:ID和IsClick(有标题)。我的目标是使用列ID作为pred的索引Python 2.7 将列表用作索引时出现越界错误,python-2.7,pandas,Python 2.7,Pandas,我有两个文件:一个是单列(称为pred),没有标题,另一个有两列:ID和IsClick(有标题)。我的目标是使用列ID作为pred的索引 import pandas as pd import numpy as np def LinesInFile(path): with open(path) as f: for linecount, line in enumerate(f): pass f.close() print 'Found
import pandas as pd
import numpy as np
def LinesInFile(path):
with open(path) as f:
for linecount, line in enumerate(f):
pass
f.close()
print 'Found ' + str(linecount) + ' lines'
return linecount
path ='/Users/mas/Documents/workspace/Avito/input/' # path to testing file
submission = path + 'submission1234.csv'
lines = LinesInFile(submission)
lines = LinesInFile(path + 'sampleSubmission.csv')
sample = pd.read_csv(path + 'sampleSubmission.csv')
preds = np.array(pd.read_csv(submission, header = None))
index = sample.ID.values - 1
print index
print len(index)
sample['IsClick'] = preds[index]
sample.to_csv('submission.csv', index=False)
输出为:
Found 7816360 lines
Found 7816361 lines
[ 0 4 5 ..., 15961507 15961508 15961511]
7816361
Traceback (most recent call last):
File "/Users/mas/Documents/workspace/Avito/July3b.py", line 23, in <module>
sample['IsClick'] = preds[index]
IndexError: index 7816362 is out of bounds for axis 0 with size 7816361
找到7816360行
找到7816361行
[ 0 4 5 ..., 15961507 15961508 15961511]
7816361
回溯(最近一次呼叫最后一次):
文件“/Users/mas/Documents/workspace/Avito/July3b.py”,第23行,在
样本['IsClick']=preds[索引]
索引器:索引7816362超出大小为7816361的轴0的界限
似乎出现了一些问题,因为我的文件有7816361行计数标题,而我的列表有一个额外的元素(列表7816361的len)我没有您的csv文件来重新创建该问题,但该问题似乎是由您使用
索引造成的
index=sample.ID.values-1
取每个样本ID并减去1。这些不是pred中的索引值,因为它只有7816360长。索引数组中最后3项(基于打印输出)中的每一项都会超出范围,因为它们>7816360。我怀疑错误是向您显示了超出边界的第一个ID-1
假设您只想根据文件的行号加入文件,则可以执行以下操作:
sample=pd.concat((pd.read_csv(path + 'sampleSubmission.csv'),pd.read_csv(submission, header = None).rename(columns={0:'IsClick'})),axis=1)
否则,您将需要对两个数据帧执行联接或合并