Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/gwt/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:CSV标题和数据行大小不匹配_Python_Pandas_Csv - Fatal编程技术网

Python 熊猫:CSV标题和数据行大小不匹配

Python 熊猫:CSV标题和数据行大小不匹配,python,pandas,csv,Python,Pandas,Csv,是否可以指示熊猫忽略位置超过标题大小的列 import pandas with open('test.csv', mode='w') as csv_file: csv_file.write("datetime,A\n") csv_file.write("2018-10-09 18:00:07, 123\n") df = pandas.read_csv('test.csv') print(df) 答案是: datetime A 0 2018

是否可以指示熊猫忽略位置超过标题大小的列

import pandas

with open('test.csv', mode='w') as csv_file:
    csv_file.write("datetime,A\n")
    csv_file.write("2018-10-09 18:00:07, 123\n")

df = pandas.read_csv('test.csv')
print(df)
答案是:

              datetime    A
0  2018-10-09 18:00:07  123
但是,加载包含更多数据列的CSV文件,这些数据列在标题中定义:

with open('test.csv', mode='w') as csv_file:
    csv_file.write("datetime,A\n")
    csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")

df = pandas.read_csv('test.csv')
print(df)
返回:

                        datetime     A
2018-10-09 18:00:07 123      ABC   XYZ
Pandas将标题移动到数据的最右侧位置

我需要不同的行为。我希望忽略超出标题的数据行


注意:我无法枚举列,因为这是一个通用用例。由于一些独立于我的代码的原因,有时会有更多的数据,这是意料之中的。我想忽略额外的数据。

与实际标题相比,似乎有太多的列,它假设前两列(数据)是(多)索引

使用
read\u csv
中的
usecols
参数指定要读取的数据列:

import pandas

with open('test.csv', mode='w') as csv_file:
    csv_file.write("datetime,A\n")
    csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")

df = pandas.read_csv('test.csv', usecols=[0,1]) 
print(df)
屈服

              datetime    A
0  2018-10-09 18:00:07  123

要使问题完整,请使用以下代码:

with open('test.csv', mode='w') as csv_file:
    csv_file.write("datetime,A, B, C\n")
    csv_file.write("2018-10-09 18:00:07, 123\n")

with open("test.csv") as csv_file:
    for i, line in enumerate(csv_file):
        if i == 0:
            headerCount = line.count(",") + 2
        elif i == 1:
            dataCount = line.count(",") + 2  
            if (headerCount != dataCount):
                print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
        elif i > 1:
            break


df = pandas.read_csv('test.csv', usecols=range(dataCount-1))
print(df)
给出正确的对象

Warning: Header and data size mismatch. Columns beyond header size will be removed.
              datetime    A
0  2018-10-09 18:00:07  123

现在代码显示了问题的答案

with open('test.csv', mode='w') as csv_file:
    csv_file.write("datetime,A\n")
    csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")

with open("test.csv") as csv_file:
    for i, line in enumerate(csv_file):
        if i == 0:
            headerCount = line.count(",") + 1
            colCount = headerCount
        elif i == 1:
            dataCount = line.count(",") + 1  
        elif i > 1:
            break
if (headerCount < dataCount):
    print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
    colCount=headerCount

df = pandas.read_csv('test.csv', usecols=range(colCount))
print(df)

忽略行,还是忽略列?它需要更多的代码,但是你的建议可以做到这一点。谢谢@RyszardStyczynski给出了您的示例代码,这应该足够了。我想你的意思是说你的实际代码和数据需要更多的工作。但是如果这个代码对你问题中的代码不起作用,请告诉我:也许我忽略了什么或者有什么不对劲。你是对的。我刚刚实现了完全相反的情况:当标头比数据长时。这是不必要的,因为熊猫支持它。再次感谢。
Warning: Header and data size mismatch. Columns beyond header size will be removed.
              datetime    A
0  2018-10-09 18:00:07  123