Python 熊猫：标记数据时出错--使用glob.glob时_Python_Pandas_Concatenation_Glob

Python 熊猫：标记数据时出错--使用glob.glob时

python pandas

Python 熊猫：标记数据时出错--使用glob.glob时,python,pandas,concatenation,glob,Python,Pandas,Concatenation,Glob,我正在使用以下代码连接我从中下载的几个文件（候选主文件）；但它们也可以在这里找到： https://github.com/108michael/ms_thesis/blob/master/cn06.txt https://github.com/108michael/ms_thesis/blob/master/cn08.txt https://github.com/108michael/ms_thesis/blob/master/cn10.txt https://github.com/108mic

我正在使用以下代码连接我从中下载的几个文件（候选主文件）；但它们也可以在这里找到：

https://github.com/108michael/ms_thesis/blob/master/cn06.txt
https://github.com/108michael/ms_thesis/blob/master/cn08.txt
https://github.com/108michael/ms_thesis/blob/master/cn10.txt
https://github.com/108michael/ms_thesis/blob/master/cn12.txt
https://github.com/108michael/ms_thesis/blob/master/cn14.txt

import numpy as np
import pandas as pd
import glob


df = pd.concat((pd.read_csv(f, header=None, names=['feccandid','candname',\
'party','date', 'state', 'chamber', 'district', 'incumb.challeng', \
'cand_status', '1', '2','3','4', '5', '6'  ], usecols=['feccandid', \
'party', 'date', 'state', 'chamber'])for f in glob.glob\
        ('/home/jayaramdas/anaconda3/Thesis/FEC/cn_data/cn**.txt')))

我得到以下错误：

CParserError: Error tokenizing data. C error: Expected 2 fields in line 58, saw 4

有人对此有线索吗？

pd.read\u csv的默认分隔符是逗号

，

。由于所有候选人的姓名都以

Last，First

的格式列出，pandas将显示两列：逗号前的所有内容和逗号后的所有内容。在其中一个文件中，有额外的逗号，导致pandas假设有更多的列。这就是解析器错误

要使用

作为分隔符而不是

，

，只需将代码更改为使用关键字

delimiter=“|”

或

sep=“|”

。从中，我们可以看到delimiter和sep是同一关键字的别名

新代码：

df = pd.concat((pd.read_csv(f, header=None, delimiter="|", names=['feccandid','candname',\
'party','date', 'state', 'chamber', 'district', 'incumb.challeng', \
'cand_status', '1', '2','3','4', '5', '6'  ], usecols=['feccandid', \
'party', 'date', 'state', 'chamber'])for f in glob.glob\
    ('/home/jayaramdas/anaconda3/Thesis/FEC/cn_data/cn**.txt')))

pd.read\u csv

的默认分隔符是逗号

，

。由于所有候选人的姓名都以

Last，First

要使用

作为分隔符而不是

，

，只需将代码更改为使用关键字

delimiter=“|”

或

sep=“|”

。从中，我们可以看到delimiter和sep是同一关键字的别名

新代码：

df = pd.concat((pd.read_csv(f, header=None, delimiter="|", names=['feccandid','candname',\
'party','date', 'state', 'chamber', 'district', 'incumb.challeng', \
'cand_status', '1', '2','3','4', '5', '6'  ], usecols=['feccandid', \
'party', 'date', 'state', 'chamber'])for f in glob.glob\
    ('/home/jayaramdas/anaconda3/Thesis/FEC/cn_data/cn**.txt')))

当您在一个文件上使用

read\u csv

时，数据帧是否如预期的那样？您可能需要将

delimiter=“|”

传递给

read_csv

函数。我刚刚尝试仅使用一个文件并使用？

sep='|'；然后在你的评论之后，^我尝试使用

delimiter='|'`并且效果很好。我再次尝试了整个操作，问题解决了！谢谢你的提示！很高兴它成功了！我添加了它作为一个答案，以防其他人有同样的问题。当您在一个文件上使用

read\u csv

时，数据帧是否与预期的一样？您可能需要将

delimiter=“|”

传递给

read_csv

函数。我刚刚尝试仅使用一个文件并使用？

sep='|'；然后在你的评论之后，^我尝试使用

delimiter='|'`并且效果很好。我再次尝试了整个操作，问题解决了！谢谢你的提示！很高兴它成功了！我添加了它作为一个答案，以防其他人也有同样的问题。