Python 将数据帧的每行保存到txt文件
因此,我从HDF5文件中打开一个数据集,如下所示:Python 将数据帧的每行保存到txt文件,python,pandas,numpy,hdf5,Python,Pandas,Numpy,Hdf5,因此,我从HDF5文件中打开一个数据集,如下所示: import pandas as pd import numpy as np data1 = pd.read_hdf('sport.hdf5', usecols=['category','title','images','link','date','desc']) category title images \ 0 raket Kevi
import pandas as pd
import numpy as np
data1 = pd.read_hdf('sport.hdf5', usecols=['category','title','images','link','date','desc'])
category title images \
0 raket Kevin/Marcus Langsung Fokus ke Kejuaraan Dunia... NaN
1 f1 Vettel Menangi GP Inggris yang Penuh Drama NaN
2 others Semangat 'Semakin di Depan' Warnai Kejuaraan M... NaN
5 sepakbola Roberto Martinez Mengejar Status Elite NaN
6 sepakbola Nyaris Separuh Gol Piala Dunia 2018 Lahir dari... NaN
link \
0 https://sport.detik.com/raket/d-4104834/kevinm...
1 https://sport.detik.com/f1/d-4104788/vettel-me...
2 https://sport.detik.com/sport-lain/d-4105193/s...
5 https://sport.detik.com/sepakbola/berita/d-410...
6 https://sport.detik.com/sepakbola/berita/d-410...
date \
0 Senin 09 Juli 2018, 00:31 WIB
1 Minggu 08 Juli 2018, 22:35 WIB
2 Senin 09 Juli 2018, 11:15 WIB
5 Senin 09 Juli 2018, 12:35 WIB
6 Senin 09 Juli 2018, 12:51 WIB
desc
0 - Setelah , Kevin Sanjaya/Marcus Gideon suda...
1 - Driver Ferrari keluar sebagai pemenang Gr...
2 - Kejuaraan Dunia Motocross Grand Prix (MXGP)...
5 - bisa jadi mulai kerap diperbinc...
6 - Berakhirnya perempatfinal Piala D...
它将为我提供如下输出:
import pandas as pd
import numpy as np
data1 = pd.read_hdf('sport.hdf5', usecols=['category','title','images','link','date','desc'])
category title images \
0 raket Kevin/Marcus Langsung Fokus ke Kejuaraan Dunia... NaN
1 f1 Vettel Menangi GP Inggris yang Penuh Drama NaN
2 others Semangat 'Semakin di Depan' Warnai Kejuaraan M... NaN
5 sepakbola Roberto Martinez Mengejar Status Elite NaN
6 sepakbola Nyaris Separuh Gol Piala Dunia 2018 Lahir dari... NaN
link \
0 https://sport.detik.com/raket/d-4104834/kevinm...
1 https://sport.detik.com/f1/d-4104788/vettel-me...
2 https://sport.detik.com/sport-lain/d-4105193/s...
5 https://sport.detik.com/sepakbola/berita/d-410...
6 https://sport.detik.com/sepakbola/berita/d-410...
date \
0 Senin 09 Juli 2018, 00:31 WIB
1 Minggu 08 Juli 2018, 22:35 WIB
2 Senin 09 Juli 2018, 11:15 WIB
5 Senin 09 Juli 2018, 12:35 WIB
6 Senin 09 Juli 2018, 12:51 WIB
desc
0 - Setelah , Kevin Sanjaya/Marcus Gideon suda...
1 - Driver Ferrari keluar sebagai pemenang Gr...
2 - Kejuaraan Dunia Motocross Grand Prix (MXGP)...
5 - bisa jadi mulai kerap diperbinc...
6 - Berakhirnya perempatfinal Piala D...
现在,我需要用标题title保存包含desc的每一行,我使用下面的代码:
np.savetxt(data1['title']+'.txt', data1['desc'], fmt='%s')
但是,结果是这样的:
Traceback (most recent call last):
File "index.py", line 23, in <module>
np.savetxt(data1['title']+'.txt', data1['desc'], fmt='%s')
File "/home/adminsvr/tf-py3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1187, in savetxt
if fname.endswith('.gz'):
File "/home/adminsvr/tf-py3/lib/python3.5/site-packages/pandas/core/generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'endswith'
回溯(最近一次呼叫最后一次):
文件“index.py”,第23行,在
np.savetxt(data1['title']+'.txt',data1['desc'],fmt='%s')
savetxt中的文件“/home/adminsvr/tf-py3/lib/python3.5/site packages/numpy/lib/npyio.py”,第1187行
如果fname.endswith('.gz'):
文件“/home/adminsvr/tf-py3/lib/python3.5/site packages/pandas/core/generic.py”,第3614行,位于__
返回对象。\uuuGetAttribute(self,name)
AttributeError:“Series”对象没有属性“endswith”
有什么解决方案或想法吗?工作数小时后,以下是解决问题的想法: 首先,对Data1 dataframe的行进行迭代。不要忘记添加将返回行选择的属性ItErrors。别忘了定义索引和行 要为每一行创建文件,请定义后跟(row[title])的目录,使其成为动态的 但是,目录result/还不存在。用户通过makedir来实现它 最后,在txt文件中写入(row[desc]) 我们开始:
import os
for idx,row in data1.iterrows():
filename = "result/"+str(row['title'])+".txt"
os.makedirs(os.path.dirname(filename), exist_ok=True)
with open(filename, "w+") as f:
f.write(row['desc'])
f.close()
print (idx)
请发布打印结果(data1.head())。编辑显示data1.head()的结果