Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/338.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/database/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python正在读取空间分隔的数据_Python_Database_Pandas_Dataframe_Text - Fatal编程技术网

python正在读取空间分隔的数据

python正在读取空间分隔的数据,python,database,pandas,dataframe,text,Python,Database,Pandas,Dataframe,Text,我有6个空格分隔字段的文本文件,如下所示: 702377236289228800 2016-02-24 09:19:17 +03 <Aadil_Siddiqui> #HECRanking Rs71 Bil bdget alloctd 2 HEC is not in gud hands. v can imagne dat on which criteria #HEC is sending studnts abroad on Scholrshp 您可以使用文本中不存在的某些分隔符(如|

我有6个空格分隔字段的文本文件,如下所示:

702377236289228800 2016-02-24 09:19:17 +03 <Aadil_Siddiqui> #HECRanking Rs71 Bil bdget alloctd 2 HEC is not in gud hands. v can imagne dat on which criteria #HEC is sending studnts abroad on Scholrshp

您可以使用文本中不存在的某些分隔符(如
|
)读取一列中的所有数据,然后对于新列,使用
n
参数且不使用分隔符,因为空格是默认值:

data = pd.read_csv("twitter_file_path.txt", sep="|", names=['data'])
print (data)
                                                data
0  702377236289228800 2016-02-24 09:19:17 +03 <Aa...

data = data['data'].str.split(n=5, expand=True)
data.columns = ["seq", "date", "Hour", "GMT","userID","text"]
print (data)
                  seq        date      Hour  GMT            userID  \
0  702377236289228800  2016-02-24  09:19:17  +03  <Aadil_Siddiqui>   

                                                text  
0  #HECRanking Rs71 Bil bdget alloctd 2 HEC is no...  
data=pd.read_csv(“twitter_file_path.txt”,sep=“|”,name=['data']))
打印(数据)
数据

0 702377236289228800 2016-02-24 09:19:17+03但字符|可能存在于某些行中,我真的不知道,因为我有来自不同语言的32k配置文件,所以我猜任何字符都可能出现
data = pd.read_csv("twitter_file_path.txt", sep="|", names=['data'])
print (data)
                                                data
0  702377236289228800 2016-02-24 09:19:17 +03 <Aa...

data = data['data'].str.split(n=5, expand=True)
data.columns = ["seq", "date", "Hour", "GMT","userID","text"]
print (data)
                  seq        date      Hour  GMT            userID  \
0  702377236289228800  2016-02-24  09:19:17  +03  <Aadil_Siddiqui>   

                                                text  
0  #HECRanking Rs71 Bil bdget alloctd 2 HEC is no...