Python 如何读取csv文件直到找到标题?

Python 如何读取csv文件直到找到标题?,python,pandas,Python,Pandas,我想跳过前20行,因为我需要的数据来自第21行及以下。我已经尝试了“skiprows”,但是每个文件的标题前的行都在更改。所以我希望它对任何文件都是灵活的。我该怎么做 首先,我的想法是增加一个变量,以了解需要多少跳过: skip = 0 if 'X error' not in pd.read_csv(nF): skip += 1 但它显示了一个“标记化数据的错误”。C错误:第13行中预期有1个字段,见10' CSV: <INFO> { InspectionResul

我想跳过前20行,因为我需要的数据来自第21行及以下。我已经尝试了“skiprows”,但是每个文件的标题前的行都在更改。所以我希望它对任何文件都是灵活的。我该怎么做

首先,我的想法是增加一个变量,以了解需要多少跳过:

skip = 0
if 'X error' not in pd.read_csv(nF):
    skip += 1
但它显示了一个“标记化数据的错误”。C错误:第13行中预期有1个字段,见10'

CSV:

    <INFO>
{
InspectionResultFileType:1.01-FULL-ENG
InspectMode:2
Unit:0
ReviseBalance:1
JudgeItem:448
TeachingMethod:4
ReviseMode:0
ReviseScalingX:1.000013
ReviseScalingY:0.999969
}
Insp ON/OFF,T code,Design D,X error -,X error +,Y error -,Y error +,D error -,D error +,DD error
1,T1,0.151,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T2,0.151,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T3,0.152,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T4,0.152,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T5,0.251,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T6,0.251,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T7,2.000,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
NO.,T code,H. NO.,Jud,Design X,Design Y,Design D,Measu. X,Measu. Y,Measu. D,X error,Y error,D error,DD,TimeStamp

{
检查结果文件类型:1.01-FULL-ENG
检查方式:2
单位:0
修订平衡:1
法官:448
教学方法:4
复习模式:0
修订比例X:1.000013
修订比例:0.999969
}
检查开关,T代码,设计D,X错误-,X错误+,Y错误-,Y错误+,D错误-,D错误+,DD错误
T1,0.151,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000,0.06000
T2,0.151,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000,0.06000
1,T3,0.152,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000,0.06000
1,T4,0.152,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000,0.06000
T5,0.251,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000,0.06000
1,T6,0.251,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000,0.06000
1,T7,2.000,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000,0.06000
编号、T代码、H编号、Jud、设计X、设计Y、设计D、Measu。十、 美苏。Y、 美苏。D、 X错误,Y错误,D错误,DD,时间戳
在中找到此解决方案


在中找到此解决方案。

执行类似操作以从文件中获取项目列表

arr = []
with open('xyz.csv') as f:
    for line in f:
        x = line.strip('\n').split(',')
        if len(x) > 1:
            arr.append(x)
print (arr)
其结果如下:

[['Insp ON/OFF', 'T code', 'Design D', 'X error -', 'X error +', 'Y error -', 'Y error +', 'D error -', 'D error +', 'DD error'], ['1', 'T1', '0.151', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T2', '0.151', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T3', '0.152', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T4', '0.152', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T5', '0.251', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T6', '0.251', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T7', '2.000', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['NO.', 'T code', 'H. NO.', 'Jud', 'Design X', 'Design Y', 'Design D', 'Measu. X', 'Measu. Y', 'Measu. D', 'X error', 'Y error', 'D error', 'DD', 'TimeStamp']]
看起来最后一行有15列,与数据在文件中的存储方式不一致

arr = []
with open('xyz.csv') as f:
    for line in f:
        x = line.strip('\n').split(',')
        if len(x) > 1:
            arr.append(x)
print (arr)
如果要将此数据转换为数据帧,可以编写几行额外的代码:

import pandas as pd
df = pd.DataFrame(data=arr[1:-1], columns = arr[0])
print (df)
我是第1行到最后一行,但1作为数据。以及使用第一行作为列标题

此操作的输出如下所示:

  Insp ON/OFF T code Design D X error -  ... Y error + D error - D error + DD error
0           1     T1    0.151  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
1           1     T2    0.151  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
2           1     T3    0.152  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
3           1     T4    0.152  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
4           1     T5    0.251  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
5           1     T6    0.251  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
6           1     T7    2.000  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000

[7 rows x 10 columns]

执行类似操作以从文件中获取项目列表

arr = []
with open('xyz.csv') as f:
    for line in f:
        x = line.strip('\n').split(',')
        if len(x) > 1:
            arr.append(x)
print (arr)
其结果如下:

[['Insp ON/OFF', 'T code', 'Design D', 'X error -', 'X error +', 'Y error -', 'Y error +', 'D error -', 'D error +', 'DD error'], ['1', 'T1', '0.151', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T2', '0.151', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T3', '0.152', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T4', '0.152', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T5', '0.251', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T6', '0.251', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T7', '2.000', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['NO.', 'T code', 'H. NO.', 'Jud', 'Design X', 'Design Y', 'Design D', 'Measu. X', 'Measu. Y', 'Measu. D', 'X error', 'Y error', 'D error', 'DD', 'TimeStamp']]
看起来最后一行有15列,与数据在文件中的存储方式不一致

arr = []
with open('xyz.csv') as f:
    for line in f:
        x = line.strip('\n').split(',')
        if len(x) > 1:
            arr.append(x)
print (arr)
如果要将此数据转换为数据帧,可以编写几行额外的代码:

import pandas as pd
df = pd.DataFrame(data=arr[1:-1], columns = arr[0])
print (df)
我是第1行到最后一行,但1作为数据。以及使用第一行作为列标题

此操作的输出如下所示:

  Insp ON/OFF T code Design D X error -  ... Y error + D error - D error + DD error
0           1     T1    0.151  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
1           1     T2    0.151  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
2           1     T3    0.152  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
3           1     T4    0.152  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
4           1     T5    0.251  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
5           1     T6    0.251  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
6           1     T7    2.000  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000

[7 rows x 10 columns]

您可以使用Python csv文件并读取每一行,直到您读取了一行具有预期fieldscan数量的数据。您可以将数据作为文本而不是屏幕截图共享吗?@JoeFerndz更新了问题您可以使用Python csv文件并读取每一行,直到您读取了一行具有预期fieldscan数量的数据请用文字而不是截图?@JoeFerndz更新了问题