Python 拆分数据帧并保存为txt文件_Python_Pandas

Python 拆分数据帧并保存为txt文件

python pandas

Python 拆分数据帧并保存为txt文件,python,pandas,Python,Pandas,我有这样一个数据帧： Histogram DN Npts Total Percent Acc Pct Band 1 -0.054741 1 1 0.0250 0.0250 Bin=0.00233 -0.052404 0 1 0.0000 0.0250 -0.050067 0 1 0.00

我有这样一个数据帧：

  Histogram           DN     Npts    Total   Percent   Acc Pct
  Band 1       -0.054741        1        1    0.0250    0.0250
  Bin=0.00233  -0.052404        0        1    0.0000    0.0250
               -0.050067        0        1    0.0000    0.0250
               -0.047730        0        1    0.0000    0.0250
               -0.045393        0        1    0.0000    0.0250
               -0.043056        0        1    0.0000    0.0250
               -0.040719        0        1    0.0000    0.0250
  Histogram           DN     Npts    Total   Percent   Acc Pct
  Band 2        0.000000      346      346    9.5186    9.5186
  Bin=0.00203   0.002038        0      346    0.0000    9.5186
                0.004076        0      346    0.0000    9.5186
                0.006114        0      346    0.0000    9.5186
                0.008152        0      346    0.0000    9.5186
                0.010189        0      346    0.0000    9.5186
                0.012227        0      346    0.0000    9.5186

我想根据单词直方图出现的时间（在本例中，每8行）对其进行分割。我可以这样分割它：

np.array_split(df,8)

但是如果有一种方法可以在关键字上实现它，我会更喜欢它。然后我想将每个分割保存到自己的文本文件中。有办法做到这一点吗

df.head（）.to_json（）

{"Histogram  ":{"0":"Band 1     ","1":"Bin=0.00233","2":"           ","3":"           ","4":"           "},"       DN":{"0":"-0.054741","1":"-0.052404","2":"-0.050067","3":"-0.047730","4":"-0.045393"},"   Npts":{"0":"      1","1":"      0","2":"      0","3":"      0","4":"      0"},"  Total":{"0":"      1","1":"      1","2":"      1","3":"      1","4":"      1"}," Percent":{"0":"  0.0250","1":"  0.0000","2":"  0.0000","3":"  0.0000","4":"  0.0000"}," Acc Pct":{"0":"  0.0250","1":"  0.0250","2":"  0.0250","3":"  0.0250","4":"  0.0250"}}

首先，您应该规范化列名，此时它们包含空格（这解释了您前面看到的KeyError）：

要按乐队分组，我会使用cumsum：

In [14]: df1  # similar to your example
Out[14]:
         DN  Npts  Total  Acc Pct  Percent    Histogram
0 -0.054741     1      1    0.025    0.025  Band 1
1 -0.052404     0      1    0.025    0.000  Bin=0.00233
2 -0.050067     0      1    0.025    0.000
3 -0.047730     0      1    0.025    0.000
4 -0.045393     0      1    0.025    0.000
5 -0.054741     1      1    0.025    0.025  Band 2
6 -0.052404     0      1    0.025    0.000  Bin=0.00233
7 -0.050067     0      1    0.025    0.000
8 -0.047730     0      1    0.025    0.000
9 -0.045393     0      1    0.025    0.000

In [15]: df1["Histogram"].str.startswith("Band").cumsum()
Out[15]:
0    1
1    1
2    1
3    1
4    1
5    2
6    2
7    2
8    2
9    2
Name: Histogram, dtype: int64

您可以将其用于groupby（这是您希望拆分的方式）：

现在，您可以在空闲时提取/清洁：

In [21]: g.get_group(1)
Out[21]:
         DN  Npts  Total  Acc Pct  Percent    Histogram
0 -0.054741     1      1    0.025    0.025  Band 1
1 -0.052404     0      1    0.025    0.000  Bin=0.00233
2 -0.050067     0      1    0.025    0.000
3 -0.047730     0      1    0.025    0.000
4 -0.045393     0      1    0.025    0.000

In [22]: [x for _, x in g]
Out[22]:
[         DN  Npts  Total  Acc Pct  Percent    Histogram
 0 -0.054741     1      1    0.025    0.025  Band 1
 1 -0.052404     0      1    0.025    0.000  Bin=0.00233
 2 -0.050067     0      1    0.025    0.000
 3 -0.047730     0      1    0.025    0.000
 4 -0.045393     0      1    0.025    0.000             ,
          DN  Npts  Total  Acc Pct  Percent    Histogram
 5 -0.054741     1      1    0.025    0.025  Band 2
 6 -0.052404     0      1    0.025    0.000  Bin=0.00233
 7 -0.050067     0      1    0.025    0.000
 8 -0.047730     0      1    0.025    0.000
 9 -0.045393     0      1    0.025    0.000             ]

这将过滤dataframe txt并为直方图创建新的txt文件：

count = 1
# used in the naming of the new txt files

txtFile = "his.txt"
# histogram text file

splitTxt = " Histogram           DN     Npts    Total   Percent   Acc Pct"
# string used to split the lines of code into sections/blocks

with open(txtFile,"r") as myResults:

   blocks = myResults.read()

for contents in blocks.split(splitTxt)[1:]:

    lines = contents.split('\n')

    with open('Results_{}.txt'.format(count), 'w') as op:

        op.writelines('{}'.format(splitTxt))

        for i in range(8):

            op.writelines('{}\n'.format(lines[i]))

    count = count + 1

你有这个数据作为文本吗？如果是，很容易，这些数据最初来自一个文本文件。小心，如果你继续删除/重新发布，你将自动被禁止提问。对不起，我正在尝试找到不同的方法来实现这一点，因为我找不到另一种方法，我想将所有内容保存到文本中，然后进行装箱可能会奏效，所以这实际上有点不同。你能读入内存中的所有行，循环这些行并使用类似“if Histogram in line:”

In [21]: g.get_group(1)
Out[21]:
         DN  Npts  Total  Acc Pct  Percent    Histogram
0 -0.054741     1      1    0.025    0.025  Band 1
1 -0.052404     0      1    0.025    0.000  Bin=0.00233
2 -0.050067     0      1    0.025    0.000
3 -0.047730     0      1    0.025    0.000
4 -0.045393     0      1    0.025    0.000

In [22]: [x for _, x in g]
Out[22]:
[         DN  Npts  Total  Acc Pct  Percent    Histogram
 0 -0.054741     1      1    0.025    0.025  Band 1
 1 -0.052404     0      1    0.025    0.000  Bin=0.00233
 2 -0.050067     0      1    0.025    0.000
 3 -0.047730     0      1    0.025    0.000
 4 -0.045393     0      1    0.025    0.000             ,
          DN  Npts  Total  Acc Pct  Percent    Histogram
 5 -0.054741     1      1    0.025    0.025  Band 2
 6 -0.052404     0      1    0.025    0.000  Bin=0.00233
 7 -0.050067     0      1    0.025    0.000
 8 -0.047730     0      1    0.025    0.000
 9 -0.045393     0      1    0.025    0.000             ]

count = 1
# used in the naming of the new txt files

txtFile = "his.txt"
# histogram text file

splitTxt = " Histogram           DN     Npts    Total   Percent   Acc Pct"
# string used to split the lines of code into sections/blocks

with open(txtFile,"r") as myResults:

   blocks = myResults.read()

for contents in blocks.split(splitTxt)[1:]:

    lines = contents.split('\n')

    with open('Results_{}.txt'.format(count), 'w') as op:

        op.writelines('{}'.format(splitTxt))

        for i in range(8):

            op.writelines('{}\n'.format(lines[i]))

    count = count + 1