Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何通过处理换行符等情况来读取csv文件?_Python_Python 3.x_Pandas_Amazon S3_Codec - Fatal编程技术网

Python 如何通过处理换行符等情况来读取csv文件?

Python 如何通过处理换行符等情况来读取csv文件?,python,python-3.x,pandas,amazon-s3,codec,Python,Python 3.x,Pandas,Amazon S3,Codec,我已经编写了上面的代码来逐行从S3流式传输csv文件。但是,csv文件中有一行的其中一行中有一个enter。在本地下载文件时,Pandas能够读取该文件,但在上述代码中,它生成了一个错误: for row in codecs.getreader(self.encoding)(self.response[u'Body']).readlines(): row_string = StringIO(row) print ("Row read from th

我已经编写了上面的代码来逐行从S3流式传输csv文件。但是,csv文件中有一行的其中一行中有一个enter。在本地下载文件时,Pandas能够读取该文件,但在上述代码中,它生成了一个错误:

for row in codecs.getreader(self.encoding)(self.response[u'Body']).readlines():
            row_string = StringIO(row)
            print ("Row read from the data is: ")
            print (row_string.getvalue())
            df = pd.read_csv(row_string, sep=",")
忽略上面的第0行注释,正如您在我的代码中看到的,我读取了一行并形成了它的数据帧

完整的错误回溯是:

[2018-11-12 14:11:45,586] {models.py:1595} ERROR - Error tokenizing data. C error: EOF inside string starting at line 0

一个关于你的不幸的快速谷歌出现了,它表明:

解决方案是在
read\u csv
函数调用。CSV解析器可以使用两个不同的“引擎” 解析CSV文件–Python或C(默认)

引擎:{'c','python'},可选

要使用的解析器引擎。C引擎比python引擎更快 目前已完成更多功能


不,我的代码有一个使用readlines()的for循环。这是一个不完整的行:
print(row\u string.getvalue())
验证了此语句。因此,若row_字符串变量本身被弄乱了,那个么熊猫改进(正如您所建议的)可以做什么呢?!有趣的是,从您的代码中可以看出,每一行本身都是一个csv
df
,因为您正在为每一行创建一个
df
。如果不是这样,那么将整个数据作为csv读取到一个内聚的
df
,会有什么问题?我的目标是从S3流式传输csv文件。在这里,我流一行并形成一个数据帧。你能推荐一个更好的方法吗?甚至是你的方法,如何实现?
[2018-11-12 14:11:45,586]



{models.py:1595} ERROR - Error tokenizing data. C error: EOF inside
 string starting at line 0 Traceback (most recent call last):   File
 "/usr/local/lib/python3.5/dist-packages/airflow/models.py", line 1493,
 in _run_raw_task
     result = task_copy.execute(context=context)   File "/usr/local/lib/python3.5/dist-packages/airflow/operators/python_operator.py",
 line 89, in execute
     return_value = self.execute_callable()   File "/usr/local/lib/python3.5/dist-packages/airflow/operators/python_operator.py",
 line 94, in execute_callable
     return self.python_callable(*self.op_args, **self.op_kwargs)   File
 "/usr/local/lib/python3.5/dist-packages/pallet-0.0.0-py3.5.egg/pallet/tasks/versionator.py", line 228, in driver_de_versionator
     a.index_patch()   File "/usr/local/lib/python3.5/dist-packages/pallet-0.0.0-py3.5.egg/pallet/tasks/versionator.py", line 202, in index_patch
     DB.process(self.form_candidate_version, self.destination_of_kch_file_to_be_downloaded)   File
 "/usr/local/lib/python3.5/dist-packages/pallet-0.0.0-py3.5.egg/pallet/tasks/versionator.py", line 144, in form_candidate_version
     df = pd.read_csv(row_string, sep=",")   File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line
 678, in parser_f
     return _read(filepath_or_buffer, kwds)   File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line
 440, in _read
     parser = TextFileReader(filepath_or_buffer, **kwds)   File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line
 787, in __init__
     self._make_engine(self.engine)   File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line
 1014, in _make_engine
     self._engine = CParserWrapper(self.f, **self.options)   File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line
 1708, in __init__
     self._reader = parsers.TextReader(src, **kwds)   File "pandas/_libs/parsers.pyx", line 539, in
 pandas._libs.parsers.TextReader.__cinit__   File
 "pandas/_libs/parsers.pyx", line 737, in
 pandas._libs.parsers.TextReader._get_header   File
 "pandas/_libs/parsers.pyx", line 932, in
 pandas._libs.parsers.TextReader._tokenize_rows   File
 "pandas/_libs/parsers.pyx", line 2112, in
 pandas._libs.parsers.raise_parser_error pandas.errors.ParserError:
 Error tokenizing data. C error: EOF inside string starting at line 0
pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, 
                header='infer', names=None, 
                index_col=None, usecols=None, squeeze=False, 
                ..., engine=None, ...)