Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/340.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python csv处理列中的逗号_Python_Pandas_Csv - Fatal编程技术网

python csv处理列中的逗号

python csv处理列中的逗号,python,pandas,csv,Python,Pandas,Csv,处理包含小说文本数据的csv文件 book_id, title, content 1, book title 1, All Passion Spent is written in three parts, primarily from the view of an intimate observer. 2, Book Title 2, In particular Mr FitzGeorge, a forgotten acquaintance from India who has ever s

处理包含小说文本数据的csv文件

book_id, title, content
1, book title 1, All Passion Spent is written in three parts, primarily from the view of an intimate observer. 
2, Book Title 2,  In particular Mr FitzGeorge, a forgotten acquaintance from India who has ever since been in love with her, introduces himself and they form a quiet but playful and understanding friendship. It cost 3,4234 to travel. 
内容列中的文本有逗号,不幸的是,当您尝试使用pandas.read_csv时,您会得到
pandas.errors.parserror:Error标记化数据。C错误:

这个问题有一些解决方案,但没有一个有效。尝试作为常规文件读取,然后传递到数据帧失败。

您可以尝试读取文件,然后使用
str.split(“,”,2)
拆分内容,然后将结果转换为DF

Ex:

import pandas as pd
content = []
with open(filename, "r") as infile:
    header = infile.readline().strip().split(",")
    content = [i.strip().split(",", 2) for i in infile.readlines()]

df = pd.DataFrame(content, columns=header)
print(df)
  book_id          title                                            content
0       1   book title 1   All Passion Spent is written in three parts, ...
1       2   Book Title 2    In particular Mr FitzGeorge, a forgotten acq...
输出:

import pandas as pd
content = []
with open(filename, "r") as infile:
    header = infile.readline().strip().split(",")
    content = [i.strip().split(",", 2) for i in infile.readlines()]

df = pd.DataFrame(content, columns=header)
print(df)
  book_id          title                                            content
0       1   book title 1   All Passion Spent is written in three parts, ...
1       2   Book Title 2    In particular Mr FitzGeorge, a forgotten acq...

id
title
中是否有逗号?之所以出现错误,是因为
中有一个额外的逗号。您是否可以使用像
@
这样的随机分隔符替换前两个逗号,并更改csv解析器中的默认分隔符
pandas.csv_reaser(文件名,sep='@')
line.replace(','@',maxreplace=2)
。如果标题中有逗号,则需要一个正则表达式替换来匹配标题。@chrisz标题中可以有分隔符title@Rakesh基本上索引不匹配的列比标题中的多。我喜欢,除了你可以
content=[I.strip().split(“,”,2)for I in infle]
减少中间
数据列表使用的内存。@tdelaneyThanks@Rakesh这是一个例子,我有更多的列(20),在这种情况下,如何分割工作