Python 熊猫:CSV标题和数据行大小不匹配
是否可以指示熊猫忽略位置超过标题大小的列Python 熊猫:CSV标题和数据行大小不匹配,python,pandas,csv,Python,Pandas,Csv,是否可以指示熊猫忽略位置超过标题大小的列 import pandas with open('test.csv', mode='w') as csv_file: csv_file.write("datetime,A\n") csv_file.write("2018-10-09 18:00:07, 123\n") df = pandas.read_csv('test.csv') print(df) 答案是: datetime A 0 2018
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123\n")
df = pandas.read_csv('test.csv')
print(df)
答案是:
datetime A
0 2018-10-09 18:00:07 123
但是,加载包含更多数据列的CSV文件,这些数据列在标题中定义:
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
df = pandas.read_csv('test.csv')
print(df)
返回:
datetime A
2018-10-09 18:00:07 123 ABC XYZ
Pandas将标题移动到数据的最右侧位置
我需要不同的行为。我希望忽略超出标题的数据行
注意:我无法枚举列,因为这是一个通用用例。由于一些独立于我的代码的原因,有时会有更多的数据,这是意料之中的。我想忽略额外的数据。与实际标题相比,似乎有太多的列,它假设前两列(数据)是(多)索引 使用
read\u csv
中的usecols
参数指定要读取的数据列:
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
df = pandas.read_csv('test.csv', usecols=[0,1])
print(df)
屈服
datetime A
0 2018-10-09 18:00:07 123
要使问题完整,请使用以下代码:
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A, B, C\n")
csv_file.write("2018-10-09 18:00:07, 123\n")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 2
elif i == 1:
dataCount = line.count(",") + 2
if (headerCount != dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
elif i > 1:
break
df = pandas.read_csv('test.csv', usecols=range(dataCount-1))
print(df)
给出正确的对象
Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime A
0 2018-10-09 18:00:07 123
现在代码显示了问题的答案
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 1
colCount = headerCount
elif i == 1:
dataCount = line.count(",") + 1
elif i > 1:
break
if (headerCount < dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
colCount=headerCount
df = pandas.read_csv('test.csv', usecols=range(colCount))
print(df)
忽略行,还是忽略列?它需要更多的代码,但是你的建议可以做到这一点。谢谢@RyszardStyczynski给出了您的示例代码,这应该足够了。我想你的意思是说你的实际代码和数据需要更多的工作。但是如果这个代码对你问题中的代码不起作用,请告诉我:也许我忽略了什么或者有什么不对劲。你是对的。我刚刚实现了完全相反的情况:当标头比数据长时。这是不必要的,因为熊猫支持它。再次感谢。
Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime A
0 2018-10-09 18:00:07 123