python正在读取空间分隔的数据
我有6个空格分隔字段的文本文件,如下所示:python正在读取空间分隔的数据,python,database,pandas,dataframe,text,Python,Database,Pandas,Dataframe,Text,我有6个空格分隔字段的文本文件,如下所示: 702377236289228800 2016-02-24 09:19:17 +03 <Aadil_Siddiqui> #HECRanking Rs71 Bil bdget alloctd 2 HEC is not in gud hands. v can imagne dat on which criteria #HEC is sending studnts abroad on Scholrshp 您可以使用文本中不存在的某些分隔符(如|
702377236289228800 2016-02-24 09:19:17 +03 <Aadil_Siddiqui> #HECRanking Rs71 Bil bdget alloctd 2 HEC is not in gud hands. v can imagne dat on which criteria #HEC is sending studnts abroad on Scholrshp
您可以使用文本中不存在的某些分隔符(如
|
)读取一列中的所有数据,然后对于新列,使用n
参数且不使用分隔符,因为空格是默认值:
data = pd.read_csv("twitter_file_path.txt", sep="|", names=['data'])
print (data)
data
0 702377236289228800 2016-02-24 09:19:17 +03 <Aa...
data = data['data'].str.split(n=5, expand=True)
data.columns = ["seq", "date", "Hour", "GMT","userID","text"]
print (data)
seq date Hour GMT userID \
0 702377236289228800 2016-02-24 09:19:17 +03 <Aadil_Siddiqui>
text
0 #HECRanking Rs71 Bil bdget alloctd 2 HEC is no...
data=pd.read_csv(“twitter_file_path.txt”,sep=“|”,name=['data']))
打印(数据)
数据
0 702377236289228800 2016-02-24 09:19:17+03但字符|可能存在于某些行中,我真的不知道,因为我有来自不同语言的32k配置文件,所以我猜任何字符都可能出现
data = pd.read_csv("twitter_file_path.txt", sep="|", names=['data'])
print (data)
data
0 702377236289228800 2016-02-24 09:19:17 +03 <Aa...
data = data['data'].str.split(n=5, expand=True)
data.columns = ["seq", "date", "Hour", "GMT","userID","text"]
print (data)
seq date Hour GMT userID \
0 702377236289228800 2016-02-24 09:19:17 +03 <Aadil_Siddiqui>
text
0 #HECRanking Rs71 Bil bdget alloctd 2 HEC is no...