pandas使用chunksize读取\u csv

pandas使用chunksize读取\u csv,pandas,Pandas,错误信息 但是如果我将chunksize从10000修改为100000。 chunksize=100000, 没关系,没问题 为什么,我设置了chunksize=10000,有错误吗 # calculate CTR count_all = 0 count_4 = 0 for df in pd.read_csv( open("%s/tianchi_fresh_comp_train_user.csv" % root_path,'r'), chunksize=10000): try:

错误信息

但是如果我将
chunksize
从10000修改为100000。
chunksize=100000
, 没关系,没问题

为什么,我设置了
chunksize=10000
,有错误吗

# calculate CTR
count_all = 0
count_4 = 0
for df in pd.read_csv( open("%s/tianchi_fresh_comp_train_user.csv" % 
root_path,'r'), chunksize=10000):
     try:
         count_user = df['behavior_type'].value_counts()
         count_all += count_user[1]+count_user[2]+count_user[3]+count_user[4]
         count_4 += count_user[4]
     except StopIteration:
         print("Iteration is stopped.")

# CTR
print(count_all)
print(count_4)

我修改了代码,现在可以了,当chunksize=10000时,没问题。

这里的问题是,当您执行chunk 1000时,某些chunk文件将不包含行为类型4是的,您是对的。但如何解决这个问题呢?我是否应该检查每个区块是否有1、2、3或4??
count_all = 0
count_4 = 0
for df in pd.read_csv( open("%s/tianchi_fresh_comp_train_user.csv" % root_path,'r'), 
chunksize=10000):
    try:
        count_user = df['behavior_type'].value_counts()
        for i in range(5):
            if i not in count_user.index: count_user[i] = 0
            else:
               count_all += count_user[i]
        count_4 += count_user[4]
    except StopIteration:
          print("Iteration is stopped.")