Python pandas read_csv读取大文件的问题_Python_Pandas_Numpy_Csv_Tensorflow

Python pandas read_csv读取大文件的问题

python pandas numpy csv tensorflow

Python pandas read_csv读取大文件的问题,python,pandas,numpy,csv,tensorflow,Python,Pandas,Numpy,Csv,Tensorflow,我正在尝试构建一个项目，并创建了一个csv，其中包含13347行和2500列，但在通过pandas读取文件的过程中，仅读取了最初的6600个值，因此我的模型无法正确构建。请告诉我发生的原因以及我如何解决它。我附加了一部分代码和输出 **Code:** data=pd.read_csv("train_foo.csv",low_memory=False) dataset=np.array(data) print(dataset.shape) np.random.shuffle(dataset) x=

我正在尝试构建一个项目，并创建了一个csv，其中包含13347行和2500列，但在通过pandas读取文件的过程中，仅读取了最初的6600个值，因此我的模型无法正确构建。请告诉我发生的原因以及我如何解决它。我附加了一部分代码和输出

**Code:**
data=pd.read_csv("train_foo.csv",low_memory=False)
dataset=np.array(data)
print(dataset.shape)
np.random.shuffle(dataset)
x=dataset
y=dataset
x=x[:,1:2501]
y=y[:,0]
**#splitting the data into training and testing set, normalizing the values**
x_train=x[0:12000,:]    # 12000 samples in training set
x_train=x_train/255.     # coverting the pixel values0-255 into 0-1
x_test=x[12001:13345,:] # 1345 samples in testing set
x_test=x_test/255.
y=y.reshape(y.shape[0],1)
y_train=y[0:12000,:]
y_train=y_train.T
y_test=y[12001:13345,:]
y_test=y_test.T
print("no. of training examples:"+str(x_train.shape[0]))
print("no. of test examples:"+str(x_test.shape[0]))
print("x_train shape:"+ str(x_train.shape))
print("x_test shape:"+ str(x_test.shape))
print("y_train shape:"+str(y_train.shape))
print("y_test shape: "+str(y_test.shape))

**output:**
no. of training examples:6672
no. of test examples:0
x_train shape:(6672, 2500)
x_test shape:(0, 2500)
y_train shape:(1, 6672)
y_test shape: (1, 0)

您可以尝试按块读取CSV，如文档中所示：

reader = pd.read_csv('tmp.sv', sep='|', chunksize=4)
for chunk in reader:
    print(chunk)

我在我的系统上运行了上面的代码段，它运行得很好。请检查您的代码的其他部分中是否未修改变量。Ya代码未显示任何错误。它仅重新写入csv文件的初始6675行，其余行未在我的系统中读取，行数增加到12000。请尝试在您的系统上运行此代码段，并告知结果。我用随机数据创建了一个类似的数据帧。谢谢，但我已经重新检查了它，变量没有变化……我也更新了熊猫，但它仍然没有完全读取csv文件。我甚至尝试过分块读取，但仍然没有完全读取。是否可能csv文件中存在问题？我也使用了此选项，但它仍然只读取最初的6675行