Python 为什么熊猫在我的代码中迭代csv时会跳过第一组块_Python_Csv_Pandas_Chunks

Python 为什么熊猫在我的代码中迭代csv时会跳过第一组块

python csv pandas

Python 为什么熊猫在我的代码中迭代csv时会跳过第一组块,python,csv,pandas,chunks,Python,Csv,Pandas,Chunks,我有一个非常大的CSV文件，我通过pandas的chunks函数通过迭代读取。问题是：如果chunksize=2，它会跳过前2行，我收到的第一个块是第3-4行基本上，如果我用nrows=4读取CSV，我会得到前4行，而用chunksize=2分块同一个文件会得到第一行3和4，然后是5和6 #1. Read with nrows #read first 4 rows in csv files and merge date and time column to be used as index

我有一个非常大的CSV文件，我通过pandas的chunks函数通过迭代读取。问题是：如果chunksize=2，它会跳过前2行，我收到的第一个块是第3-4行

基本上，如果我用nrows=4读取CSV，我会得到前4行，而用chunksize=2分块同一个文件会得到第一行3和4，然后是5和6

#1. Read with nrows  
#read first 4 rows in csv files and merge date and time column to be used as index
reader = pd.read_csv('filename.csv', delimiter=',', parse_dates={"Datetime" : [1,2]}, index_col=[0], nrows=4)

print (reader)

01/01/2016 - 09:30 - A - 100
01/01/2016 - 13:30 - A - 110
01/01/2016 - 15:30 - A - 120
02/01/2016 - 10:30 - A - 115

#2. Iterate over csv file with chunks
#iterate over csv file in chunks and merge date and time column to be used as index
reader = pd.read_csv('filename.csv', delimiter=',', parse_dates={"Datetime" : [1,2]}, index_col=[0], chunksize=2)

for chunk in reader:

    #create a dataframe from chunks
    df = reader.get_chunk()
    print (df)

01/01/2016 - 15:30 - A - 120
02/01/2016 - 10:30 - A - 115

将chunksize增加到10将跳过前10行

有没有办法解决这个问题？我已经找到了一个可行的解决办法，我想知道我哪里弄错了

欢迎您的任何意见

不要调用

get\u chunk

。您已经拥有了块，因为您正在遍历读取器，即

chunk

是您的数据帧。在循环中调用

print（chunk）

，您将看到预期的输出

正如@MaxU在评论中指出的那样，如果您想要大小不同的块，您需要使用

get\u chunk

：

reader.get\u chunk（500）

，

reader.get\u chunk（100）

，等等。

不要调用

get\u chunk

。您已经拥有了块，因为您正在遍历读取器，即

chunk

是您的数据帧。在循环中调用

print（chunk）

，您将看到预期的输出

正如@MaxU在评论中指出的，如果您想要大小不同的块，您需要使用

get\u chunk

：

reader.get\u chunk（500）

，

reader.get\u chunk（100）

，等等。

不要调用

get\u chunk

。您已经拥有了块，因为您正在遍历读取器，即

chunk

是您的数据帧。在循环中调用

print（chunk）

可以打印前两行。非常感谢您的快速帮助，非常有用。所以“get_chunk”基本上已经给我下一块了。对于新手的问题，我很抱歉，我没有从文档中理解这一点。你想把这个作为一个答案，这样我就可以说它是正确的，然后结束这个问题吗？@David，看-可能是这样helpful@MaxU谢谢，这清楚地说明了使用get\u chunk的目的。不要调用

get\u chunk

。您已经拥有了块，因为您正在遍历读取器，即

chunk

是您的数据帧。在循环中调用

print（chunk）

可以打印前两行。非常感谢您的快速帮助，非常有用。所以“get_chunk”基本上已经给我下一块了。对于新手的问题，我很抱歉，我没有从文档中理解这一点。你想把这个作为一个答案，这样我就可以说它是正确的，然后结束这个问题吗？@David，看-可能是这样helpful@MaxU谢谢，这使使用get_chunk的目的变得非常清楚。如果要读取不同大小的块，请使用

get_chunk（）

：

reader.get_chunk（100）。。。读卡器。获取块（500）。。。读卡器。获取块（30）@MaxU:谢谢，这更有意义。更新了答案。如果要读取不同大小的块，请使用get_chunk（）
：reader.get_chunk（100）。。。读卡器。获取块（500）。。。读卡器。获取块（30）@MaxU:谢谢，这更有意义。更新了答案。