Python 从文本文件中删除字符串保持浮动_Python_String_File

Python 从文本文件中删除字符串保持浮动

python string file

Python 从文本文件中删除字符串保持浮动,python,string,file,Python,String,File,我希望删除文本文件中带有字符串或空行的行。看起来像这样。正如您所看到的，头文件在文件中自动重复它。每个块中包含数据的行数不同。我需要它在numpy中作为数组导入。起初我用逗号表示小数点，但至少我能改变它我试过这个，但根本不起作用： from types import StringType z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r') for line in z.readlines(): for x in z:

我希望删除文本文件中带有字符串或空行的行。看起来像这样。正如您所看到的，头文件在文件中自动重复它。每个块中包含数据的行数不同。我需要它在numpy中作为数组导入。起初我用逗号表示小数点，但至少我能改变它

我试过这个，但根本不起作用：

from types import StringType

z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r')
for line in z.readlines():
    for x in z:
        if type(z.readline(x)) is StringType:
            print line


z.close()

数据示例：

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

为什么不使用

numpy.loadtxt

？对于这些情况，它有一个非常好的界面。
见

此外，由于您有heder（它应该是一个头文件，可以在文件的顶部找到），因此您可以将文件拆分为多个文件。您可以使用Python，也可以使用UNIX命令

csplit

。如何做到这一点，以及您将得到什么：

oz123@:~/tmp> csplit -k data.txt   '/^bla/' '{*}'
0
787
786
oz123@:~/tmp> ls xx
xx00  xx01  xx02
oz123@:~/tmp> ls xx00
xx00
oz123@:~/tmp> cat xx00
oz123@:~/tmp> cat xx01
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

oz123@:~/tmp> cat xx02
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

Python最初会将所有文件元素作为字符串读取，除非您强制转换它们，因此您的方法将无法工作

您最好的选择可能是使用正则表达式过滤掉其中包含非数据字符的行

f = open("datafile")
for line in f:
  #Catch everything that has a non-number/space in it
  if re.search("[^-0-9.\s]",line): 
     continue
  # Catch empty lines
  if len(line.strip()) == 0:
     continue
  # Keep the rest
  print(line)

f.close()

如果第[0]行.isdigit（）：which（）

您能举个例子吗？我对这些文档的阅读并没有显示处理分散在整个文件中的标题的方法。@StevenRumbalski，我想它假设

标题实际上在顶部，而不是文件中的某个地方。@Oz123不幸的是，OP中没有这种情况question@Chris，OP可能已经从某个仪器获取了所有数据文件。这台仪器——我猜在这里——只吐出一个文件。出于某种原因，OP将它们堆叠到一个文件中。将它们拆分成多个文件并读取它们应该不成问题…@Oz123，仪器通过测试自动附加不同周期的数据。我没有把它们叠起来。我正在制作一个GUI来分析数据，所以我不希望用户导入多个文件。这可能需要更多的时间，但对用户来说会更容易。哇，非常感谢！我所要做的就是修改if-re.search（“[^-0-9.\s]”）：for-if-re.search（[^-0-9.\s]”，第行）：并继续摇滚。@Starter2如果它回答了您的问题，您能将其标记为答案吗？；）
f = open("datafile")
for line in f:
  #Catch everything that has a non-number/space in it
  if re.search("[^-0-9.\s]",line): 
     continue
  # Catch empty lines
  if len(line.strip()) == 0:
     continue
  # Keep the rest
  print(line)

f.close()