用python读取具有可变间距的文本文件
我有以下文本文件形式的数据,我想将其加载到python中:用python读取具有可变间距的文本文件,python,python-3.x,csv,pandas,Python,Python 3.x,Csv,Pandas,我有以下文本文件形式的数据,我想将其加载到python中: pclass survived name 0 1 1 Allen, Miss. Elisabeth Walton 1 1 1 Allison, Master. Hudson
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob
因为空格不是常量,而且最后一个字段(名称)之间有空格,所以我在解析它时遇到了困难。我尝试了以下方法:
pd.read_csv("test.csv",sep = "\s+", header=0, index_col=0)
但它给出了一个错误:
CParserError: Error tokenizing data. C error: Expected 7 fields in line 5, saw 8
“\s+”
假定一个或多个空格仍在解析最后一列。而是使用一个假定有两个或多个的正则表达式
pd.read_csv("test.csv", sep="\s{2,}", header=0, index_col=0, engine='python')
整个工作示例
from io import StringIO
import pandas as pd
txt = """ pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob
"""
pd.read_csv(StringIO(txt), sep="\s{2,}", header=0, index_col=0, engine='python')
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob
您可以使用(又名:固定宽度格式)执行以下操作:
代码:
df = pd.read_fwf(StringIO(data), header=1, index_col=0)
from io import StringIO
import pandas as pd
data = u"""
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob"""
df = pd.read_fwf(StringIO(data), header=1, index_col=0)
print(df)
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob
测试代码:
df = pd.read_fwf(StringIO(data), header=1, index_col=0)
from io import StringIO
import pandas as pd
data = u"""
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob"""
df = pd.read_fwf(StringIO(data), header=1, index_col=0)
print(df)
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob
结果:
df = pd.read_fwf(StringIO(data), header=1, index_col=0)
from io import StringIO
import pandas as pd
data = u"""
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob"""
df = pd.read_fwf(StringIO(data), header=1, index_col=0)
print(df)
pclass survived name
0 1 1 Allen, Miss. Elisabeth Walton
1 1 1 Allison, Master. Hudson Trevor
2 1 0 Allison, Miss. Helen Loraine
3 1 0 Allison, Mr. Hudson Joshua Creighton
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5 1 1 Anderson, Mr. Harry
6 1 1 Andrews, Miss. Kornelia Theodosia
7 1 0 Andrews, Mr. Thomas Jr
8 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson)
9 1 0 Artagaveytia, Mr. Ramon
10 1 0 Astor, Col. John Jacob