Python 拆分数据帧并保存为txt文件
我有这样一个数据帧:Python 拆分数据帧并保存为txt文件,python,pandas,Python,Pandas,我有这样一个数据帧: Histogram DN Npts Total Percent Acc Pct Band 1 -0.054741 1 1 0.0250 0.0250 Bin=0.00233 -0.052404 0 1 0.0000 0.0250 -0.050067 0 1 0.00
Histogram DN Npts Total Percent Acc Pct
Band 1 -0.054741 1 1 0.0250 0.0250
Bin=0.00233 -0.052404 0 1 0.0000 0.0250
-0.050067 0 1 0.0000 0.0250
-0.047730 0 1 0.0000 0.0250
-0.045393 0 1 0.0000 0.0250
-0.043056 0 1 0.0000 0.0250
-0.040719 0 1 0.0000 0.0250
Histogram DN Npts Total Percent Acc Pct
Band 2 0.000000 346 346 9.5186 9.5186
Bin=0.00203 0.002038 0 346 0.0000 9.5186
0.004076 0 346 0.0000 9.5186
0.006114 0 346 0.0000 9.5186
0.008152 0 346 0.0000 9.5186
0.010189 0 346 0.0000 9.5186
0.012227 0 346 0.0000 9.5186
我想根据单词直方图出现的时间(在本例中,每8行)对其进行分割。我可以这样分割它:
np.array_split(df,8)
但是如果有一种方法可以在关键字上实现它,我会更喜欢它。然后我想将每个分割保存到自己的文本文件中。有办法做到这一点吗
df.head().to_json()
返回:
{"Histogram ":{"0":"Band 1 ","1":"Bin=0.00233","2":" ","3":" ","4":" "}," DN":{"0":"-0.054741","1":"-0.052404","2":"-0.050067","3":"-0.047730","4":"-0.045393"}," Npts":{"0":" 1","1":" 0","2":" 0","3":" 0","4":" 0"}," Total":{"0":" 1","1":" 1","2":" 1","3":" 1","4":" 1"}," Percent":{"0":" 0.0250","1":" 0.0000","2":" 0.0000","3":" 0.0000","4":" 0.0000"}," Acc Pct":{"0":" 0.0250","1":" 0.0250","2":" 0.0250","3":" 0.0250","4":" 0.0250"}}
首先,您应该规范化列名,此时它们包含空格(这解释了您前面看到的KeyError): 要按乐队分组,我会使用cumsum:
In [14]: df1 # similar to your example
Out[14]:
DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000
5 -0.054741 1 1 0.025 0.025 Band 2
6 -0.052404 0 1 0.025 0.000 Bin=0.00233
7 -0.050067 0 1 0.025 0.000
8 -0.047730 0 1 0.025 0.000
9 -0.045393 0 1 0.025 0.000
In [15]: df1["Histogram"].str.startswith("Band").cumsum()
Out[15]:
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 2
9 2
Name: Histogram, dtype: int64
您可以将其用于groupby(这是您希望拆分的方式):
现在,您可以在空闲时提取/清洁:
In [21]: g.get_group(1)
Out[21]:
DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000
In [22]: [x for _, x in g]
Out[22]:
[ DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000 ,
DN Npts Total Acc Pct Percent Histogram
5 -0.054741 1 1 0.025 0.025 Band 2
6 -0.052404 0 1 0.025 0.000 Bin=0.00233
7 -0.050067 0 1 0.025 0.000
8 -0.047730 0 1 0.025 0.000
9 -0.045393 0 1 0.025 0.000 ]
这将过滤dataframe txt并为直方图创建新的txt文件:
count = 1
# used in the naming of the new txt files
txtFile = "his.txt"
# histogram text file
splitTxt = " Histogram DN Npts Total Percent Acc Pct"
# string used to split the lines of code into sections/blocks
with open(txtFile,"r") as myResults:
blocks = myResults.read()
for contents in blocks.split(splitTxt)[1:]:
lines = contents.split('\n')
with open('Results_{}.txt'.format(count), 'w') as op:
op.writelines('{}'.format(splitTxt))
for i in range(8):
op.writelines('{}\n'.format(lines[i]))
count = count + 1
你有这个数据作为文本吗?如果是,很容易,这些数据最初来自一个文本文件。小心,如果你继续删除/重新发布,你将自动被禁止提问。对不起,我正在尝试找到不同的方法来实现这一点,因为我找不到另一种方法,我想将所有内容保存到文本中,然后进行装箱可能会奏效,所以这实际上有点不同。你能读入内存中的所有行,循环这些行并使用类似“if Histogram in line:”
In [21]: g.get_group(1)
Out[21]:
DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000
In [22]: [x for _, x in g]
Out[22]:
[ DN Npts Total Acc Pct Percent Histogram
0 -0.054741 1 1 0.025 0.025 Band 1
1 -0.052404 0 1 0.025 0.000 Bin=0.00233
2 -0.050067 0 1 0.025 0.000
3 -0.047730 0 1 0.025 0.000
4 -0.045393 0 1 0.025 0.000 ,
DN Npts Total Acc Pct Percent Histogram
5 -0.054741 1 1 0.025 0.025 Band 2
6 -0.052404 0 1 0.025 0.000 Bin=0.00233
7 -0.050067 0 1 0.025 0.000
8 -0.047730 0 1 0.025 0.000
9 -0.045393 0 1 0.025 0.000 ]
count = 1
# used in the naming of the new txt files
txtFile = "his.txt"
# histogram text file
splitTxt = " Histogram DN Npts Total Percent Acc Pct"
# string used to split the lines of code into sections/blocks
with open(txtFile,"r") as myResults:
blocks = myResults.read()
for contents in blocks.split(splitTxt)[1:]:
lines = contents.split('\n')
with open('Results_{}.txt'.format(count), 'w') as op:
op.writelines('{}'.format(splitTxt))
for i in range(8):
op.writelines('{}\n'.format(lines[i]))
count = count + 1