python csv处理列中的逗号
处理包含小说文本数据的csv文件python csv处理列中的逗号,python,pandas,csv,Python,Pandas,Csv,处理包含小说文本数据的csv文件 book_id, title, content 1, book title 1, All Passion Spent is written in three parts, primarily from the view of an intimate observer. 2, Book Title 2, In particular Mr FitzGeorge, a forgotten acquaintance from India who has ever s
book_id, title, content
1, book title 1, All Passion Spent is written in three parts, primarily from the view of an intimate observer.
2, Book Title 2, In particular Mr FitzGeorge, a forgotten acquaintance from India who has ever since been in love with her, introduces himself and they form a quiet but playful and understanding friendship. It cost 3,4234 to travel.
内容列中的文本有逗号,不幸的是,当您尝试使用pandas.read_csv时,您会得到pandas.errors.parserror:Error标记化数据。C错误:
这个问题有一些解决方案,但没有一个有效。尝试作为常规文件读取,然后传递到数据帧失败。
您可以尝试读取文件,然后使用
str.split(“,”,2)
拆分内容,然后将结果转换为DF
Ex:
import pandas as pd
content = []
with open(filename, "r") as infile:
header = infile.readline().strip().split(",")
content = [i.strip().split(",", 2) for i in infile.readlines()]
df = pd.DataFrame(content, columns=header)
print(df)
book_id title content
0 1 book title 1 All Passion Spent is written in three parts, ...
1 2 Book Title 2 In particular Mr FitzGeorge, a forgotten acq...
输出:
import pandas as pd
content = []
with open(filename, "r") as infile:
header = infile.readline().strip().split(",")
content = [i.strip().split(",", 2) for i in infile.readlines()]
df = pd.DataFrame(content, columns=header)
print(df)
book_id title content
0 1 book title 1 All Passion Spent is written in three parts, ...
1 2 Book Title 2 In particular Mr FitzGeorge, a forgotten acq...
id
或title
中是否有逗号?之所以出现错误,是因为中有一个额外的逗号。您是否可以使用像@
这样的随机分隔符替换前两个逗号,并更改csv解析器中的默认分隔符pandas.csv_reaser(文件名,sep='@')
和line.replace(','@',maxreplace=2)
。如果标题中有逗号,则需要一个正则表达式替换来匹配标题。@chrisz标题中可以有分隔符title@Rakesh基本上索引不匹配的列比标题中的多。我喜欢,除了你可以content=[I.strip().split(“,”,2)for I in infle]
减少中间数据列表使用的内存。@tdelaneyThanks@Rakesh这是一个例子,我有更多的列(20),在这种情况下,如何分割工作