根据文件中的不同部分从唯一逻辑生成ruby哈希

根据文件中的不同部分从唯一逻辑生成ruby哈希,ruby,file,parsing,hash,Ruby,File,Parsing,Hash,我有一个很大的文件,其中的“部分”由***分解。我必须为每个节创建一个哈希,并创建一个新文件,其中每个节都以新格式写入(我可以为节编写多个文件)。每个部分都需要唯一的逻辑才能转换为哈希(由“”或“\n”或“=”或“/some patter/”分割)。 我正在寻找一种方法来识别节,并应用适当的逻辑将节转换为散列。我可以编写单独的逻辑片段,但它们是基于模式调用的独立方法还是类 由于文件很大,我正在尝试逐行读取、操作和写入。我已经看到了在各部分之间进行全局换行的方法,但并不特别关心这种类型的解决方案。

我有一个很大的文件,其中的“部分”由
***
分解。我必须为每个节创建一个哈希,并创建一个新文件,其中每个节都以新格式写入(我可以为节编写多个文件)。每个部分都需要唯一的逻辑才能转换为哈希(由“”或“\n”或“=”或“/some patter/”分割)。
我正在寻找一种方法来识别节,并应用适当的逻辑将节转换为散列。我可以编写单独的逻辑片段,但它们是基于模式调用的独立方法还是类

由于文件很大,我正在尝试逐行读取、操作和写入。我已经看到了在各部分之间进行全局换行的方法,但并不特别关心这种类型的解决方案。我对如何在节之间获取行以及在适当的时候逐行应用不同的逻辑位有点困惑

任何方向都值得赞赏。谢谢

以下是一些输入文件:

  *** Summary ***

 Job Name = test   Date created: Mon Jan 14 15:48:33 2013


 *** Analysis Information 
 Steady State is ON
 Turbulent Incompressible Flow is ON 
 Static Temperature Equation is ON
 Mixed Convection is ON

 *** Field Variable Results Summary For Iteration 300


  Var      Mean          at      Max          at      Min
 Vx Vel  +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s
 Vy Vel  +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s
 Vz Vel  -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s
 Press  -2.05980e+001  50858 +5.19412e+003  50905 -1.44865e+003 N/m^2
 Temp  +4.60000e+001  10965 +4.60000e+001 315867 +4.60000e+001 C
 TurbK  +1.19616e+006 1319490 +1.44421e+008  10966 +1.81700e-008 mm^2/s^2
 TurbD  +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3
 Scal1  +0.00000e+000  10965 +0.00000e+000 315867 +0.00000e+000  
 PTotl  -5.91285e+000  50858 +5.19412e+003  50905 -1.44865e+003 N/m^2
 EVisc  +2.52037e-004 1320370 +1.14488e-002   2229 +0.00000e+000 g/mm-s
 ECond  +1.05355e-002 1352833 +5.88890e-002   2229 +0.00000e+000 W/mm-K
 Dens  +2.34793e-004  58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3
 Visc  +1.62605e-005  10965 +1.81700e-005   2229 +0.00000e+000 g/mm-s
 Cond  +2.50840e-002   2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K
 SpecH  +1.01202e+000  38432 +1.81000e+000  10249 +1.00500e-003 J/g-K
 Emiss  +8.94911e-001  10965 +1.00000e+000   2229 +0.00000e+000  
 Transmiss  +0.00000e+000      0 +0.00000e+000      0 +0.00000e+000  
 WRough  +0.00000e+000  10965 +0.00000e+000 315867 +0.00000e+000 mm
 SeeBeck  +0.00000e+000      0 +0.00000e+000      0 +0.00000e+000 V/K
 GenT  +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s


*** Openings ***


         *** Outlet 1 ***

  Surface ID = 2329

  Node near Minimum X,Y,Z of opening = 11761

  Minimum X,Y,Z of opening = 369.964000, 11.275438, -98.433898

     Mass Flow Out            = -1.55703 g/s
  Volume Flow Out            = -1.29242e+006 mm^3/s
     Reynolds Number         = 1303.45
     Outlet Bulk Pressure    = -0 N/m^2
     Outlet Bulk Temperature = 46 C
     Outlet Mach Number      = 0.00734951


         *** Outlet 2 ***

  Surface ID = 2332

  Node near Minimum X,Y,Z of opening = 11125

  Minimum X,Y,Z of opening = 369.964000, 73.727289, -114.615876

     Mass Flow Out            = -20.4612 g/s
  Volume Flow Out            = -1.6984e+007 mm^3/s
     Reynolds Number         = 11182.5
     Outlet Bulk Pressure    = -0 N/m^2
     Outlet Bulk Temperature = 46 C
     Outlet Mach Number      = 0.0079087


         *** Outlet 3 ***

  Surface ID = 2335

  Node near Minimum X,Y,Z of opening = 10924

  Minimum X,Y,Z of opening = 369.964000, 164.751344, 40.640056

     Mass Flow Out            = -32.8714 g/s
  Volume Flow Out            = -2.72852e+007 mm^3/s
     Reynolds Number         = 17965
     Outlet Bulk Pressure    = -0 N/m^2
     Outlet Bulk Temperature = 46 C
     Outlet Mach Number      = 0.00750077

*** Fluid Energy Balance Information:
  MdotIn x Cp x (TOut - TIn) =       663.69 Watts
 (Numerical) Energy Out  -  Energy In =      0.36447 Watts
     Heat Transfer from Wall To Fluid =       761.35 Watts
Heat Transfer Due to Sources In Fluid =            0 Watts

*** Solid Energy Balance Information:
 Heat Transfer from Exterior To Solid =            0 Watts
Heat Transfer Due to Sources In Solid =          761 Watts
    Heat Transfer From Fluid To Solid =      -761.31 Watts

*** Sum of Fluid Forces on Walls ***
   ShearX, PressX =        68651         78199   microNewtons
   ShearY, PressY =        39030   6.9349e+006   microNewtons
   ShearZ, PressZ =       -19749  -4.1017e+006   microNewtons

*** Data for internal fans 

Fan Part Id = 16 Fan Name = fname1
Operating Pressure Rise = 0.46945 Inches of Water
Operating FlowRate = 36.0109 CFM

Fan Part Id = 94 Fan Name = fname2
Operating Pressure Rise = 0.309645 Inches of Water
Operating FlowRate = 2.33407 CFM

Fan Part Id = 95 Fan Name = fname3
Operating Pressure Rise = 0.267133 Inches of Water
Operating FlowRate = 8.78264 CFM


*** Analysis Statistics: 

 Input:              461 seconds 
 Analysis:         12686 seconds 
 Output:             179 seconds 
 Total:            13326 seconds 
到目前为止,我的情况如下:

sum_file = File.new('sum_file.sum', 'r')
sum_file_hashed = File.new('sum_file_hashed', 'w')

inSection = false #flag when in or out of a section?

    while (line = sum_file.gets ) #while reading lines
        case line
        when /\*{3}/ #Found Sections by ***
            inSection = true #in a section
            l = line.gsub('*', '').strip
            sum_file_hashed.puts('Found a section: ' + l ) #write section name
        end

        ### I'm not sure how to introduce specific logic when in a certain type of section ###

    end

sum_file.close
sum_file_hashed.close
我现在正在尝试类似于:

while /found section/
  if /match pattern a/
    call parsera
  if /match pattern b/
    call parserb
end

parsera 
while =! /a section/
do stuff
return?

可以将节中的所有行添加到数组中。如果下一节开始,或者您位于文件的末尾,则使用此数组作为参数调用特定于节的方法。这样,您就不必在检测区段时处理区段逻辑

编辑: 未测试,但这应该给出一个想法:

def parse_section_name(name)
  name.gsub!(/\*/, '') # remove *'s
  return name.strip # remove whitespace from both sides
end

def call_section_logic(name, lines)
  case name
  when ...
  ...
  else # unknown name
  end
end

section_lines = []
lines = file.readlines + ['***'] # add last section, too
lines.each do |line|
  if line =~ /^\s*\*{3}/ # detect section
    call_section_logic(section_name, section_lines)
    section_name = parse_section_name(line)
    section_lines.clear
  else
    section_lines << line # add line to array
  end
end
def parse_section_name(名称)
name.gsub!(/\*/,'')删除*
return name.strip#删除两侧的空白
结束
def调用部分逻辑(名称、行)
案例名称
什么时候
...
else#未知姓名
结束
结束
节_行=[]
lines=file.readlines+['***']#也添加最后一节
行。每个do |行|
如果行=~/^\s*\*{3}/#检测段
调用分区逻辑(分区名称、分区行)
区段名称=解析区段名称(行)
第_节线。清除
其他的

section_lines您可以将节中的所有行添加到一个数组中。如果下一节开始,或者您位于文件的末尾,则使用此数组作为参数调用特定于节的方法。这样,您就不必在检测区段时处理区段逻辑

编辑: 未测试,但这应该给出一个想法:

def parse_section_name(name)
  name.gsub!(/\*/, '') # remove *'s
  return name.strip # remove whitespace from both sides
end

def call_section_logic(name, lines)
  case name
  when ...
  ...
  else # unknown name
  end
end

section_lines = []
lines = file.readlines + ['***'] # add last section, too
lines.each do |line|
  if line =~ /^\s*\*{3}/ # detect section
    call_section_logic(section_name, section_lines)
    section_name = parse_section_name(line)
    section_lines.clear
  else
    section_lines << line # add line to array
  end
end
def parse_section_name(名称)
name.gsub!(/\*/,'')删除*
return name.strip#删除两侧的空白
结束
def调用部分逻辑(名称、行)
案例名称
什么时候
...
else#未知姓名
结束
结束
节_行=[]
lines=file.readlines+['***']#也添加最后一节
行。每个do |行|
如果行=~/^\s*\*{3}/#检测段
调用分区逻辑(分区名称、分区行)
区段名称=解析区段名称(行)
第_节线。清除
其他的

Ruby的枚举表包含对这类任务非常有用的内容,它根据一些标记将文件分解成块

require 'pp'

blocks = DATA.readlines.map(&:strip).reject{ |l| l == '' }.slice_before(/\A\*{3}/)
pp blocks.to_a

__END__
  *** Summary ***

 Job Name = test   Date created: Mon Jan 14 15:48:33 2013


 *** Analysis Information 
 Steady State is ON
 Turbulent Incompressible Flow is ON 
 Static Temperature Equation is ON
 Mixed Convection is ON

 *** Field Variable Results Summary For Iteration 300


  Var      Mean          at      Max          at      Min
 Vx Vel  +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s
 Vy Vel  +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s
 Vz Vel  -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s
 Press  -2.05980e+001  50858 +5.19412e+003  50905 -1.44865e+003 N/m^2
 Temp  +4.60000e+001  10965 +4.60000e+001 315867 +4.60000e+001 C
 TurbK  +1.19616e+006 1319490 +1.44421e+008  10966 +1.81700e-008 mm^2/s^2
 TurbD  +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3
 Scal1  +0.00000e+000  10965 +0.00000e+000 315867 +0.00000e+000  
 PTotl  -5.91285e+000  50858 +5.19412e+003  50905 -1.44865e+003 N/m^2
 EVisc  +2.52037e-004 1320370 +1.14488e-002   2229 +0.00000e+000 g/mm-s
 ECond  +1.05355e-002 1352833 +5.88890e-002   2229 +0.00000e+000 W/mm-K
 Dens  +2.34793e-004  58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3
 Visc  +1.62605e-005  10965 +1.81700e-005   2229 +0.00000e+000 g/mm-s
 Cond  +2.50840e-002   2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K
 SpecH  +1.01202e+000  38432 +1.81000e+000  10249 +1.00500e-003 J/g-K
 Emiss  +8.94911e-001  10965 +1.00000e+000   2229 +0.00000e+000  
 Transmiss  +0.00000e+000      0 +0.00000e+000      0 +0.00000e+000  
 WRough  +0.00000e+000  10965 +0.00000e+000 315867 +0.00000e+000 mm
 SeeBeck  +0.00000e+000      0 +0.00000e+000      0 +0.00000e+000 V/K
 GenT  +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s
我缩短了数据,因为样本太多了

运行代码输出:

[["*** Summary ***", "Job Name = test Date created: Mon Jan 14 15:48:33 2013"], ["*** Analysis Information", "Steady State is ON", "Turbulent Incompressible Flow is ON", "Static Temperature Equation is ON", "Mixed Convection is ON"], ["*** Field Variable Results Summary For Iteration 300", "Var Mean at Max at Min", "Vx Vel +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s", "Vy Vel +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s", "Vz Vel -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s", "Press -2.05980e+001 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2", "Temp +4.60000e+001 10965 +4.60000e+001 315867 +4.60000e+001 C", "TurbK +1.19616e+006 1319490 +1.44421e+008 10966 +1.81700e-008 mm^2/s^2", "TurbD +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3", "Scal1 +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000", "PTotl -5.91285e+000 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2", "EVisc +2.52037e-004 1320370 +1.14488e-002 2229 +0.00000e+000 g/mm-s", "ECond +1.05355e-002 1352833 +5.88890e-002 2229 +0.00000e+000 W/mm-K", "Dens +2.34793e-004 58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3", "Visc +1.62605e-005 10965 +1.81700e-005 2229 +0.00000e+000 g/mm-s", "Cond +2.50840e-002 2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K", "SpecH +1.01202e+000 38432 +1.81000e+000 10249 +1.00500e-003 J/g-K", "Emiss +8.94911e-001 10965 +1.00000e+000 2229 +0.00000e+000", "Transmiss +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000", "WRough +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000 mm", "SeeBeck +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000 V/K", "GenT +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s"]] 其思想是,当代码完成时,
hash
将以散列或数组散列的形式包含解析后的数据,以便您将其写出。我建议您考虑使用YAML,因为它可以很快将数据序列化为文件


我不打算再补充了,因为这个问题听起来很像一个家庭作业,再加上分析这些行也没那么难。将文件分解成块是一项更大的任务,这一部分已经为您完成了。

Ruby的Enumerable包含了非常适合这种任务的内容,它根据一些标记将文件分解成块

require 'pp'

blocks = DATA.readlines.map(&:strip).reject{ |l| l == '' }.slice_before(/\A\*{3}/)
pp blocks.to_a

__END__
  *** Summary ***

 Job Name = test   Date created: Mon Jan 14 15:48:33 2013


 *** Analysis Information 
 Steady State is ON
 Turbulent Incompressible Flow is ON 
 Static Temperature Equation is ON
 Mixed Convection is ON

 *** Field Variable Results Summary For Iteration 300


  Var      Mean          at      Max          at      Min
 Vx Vel  +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s
 Vy Vel  +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s
 Vz Vel  -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s
 Press  -2.05980e+001  50858 +5.19412e+003  50905 -1.44865e+003 N/m^2
 Temp  +4.60000e+001  10965 +4.60000e+001 315867 +4.60000e+001 C
 TurbK  +1.19616e+006 1319490 +1.44421e+008  10966 +1.81700e-008 mm^2/s^2
 TurbD  +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3
 Scal1  +0.00000e+000  10965 +0.00000e+000 315867 +0.00000e+000  
 PTotl  -5.91285e+000  50858 +5.19412e+003  50905 -1.44865e+003 N/m^2
 EVisc  +2.52037e-004 1320370 +1.14488e-002   2229 +0.00000e+000 g/mm-s
 ECond  +1.05355e-002 1352833 +5.88890e-002   2229 +0.00000e+000 W/mm-K
 Dens  +2.34793e-004  58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3
 Visc  +1.62605e-005  10965 +1.81700e-005   2229 +0.00000e+000 g/mm-s
 Cond  +2.50840e-002   2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K
 SpecH  +1.01202e+000  38432 +1.81000e+000  10249 +1.00500e-003 J/g-K
 Emiss  +8.94911e-001  10965 +1.00000e+000   2229 +0.00000e+000  
 Transmiss  +0.00000e+000      0 +0.00000e+000      0 +0.00000e+000  
 WRough  +0.00000e+000  10965 +0.00000e+000 315867 +0.00000e+000 mm
 SeeBeck  +0.00000e+000      0 +0.00000e+000      0 +0.00000e+000 V/K
 GenT  +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s
我缩短了数据,因为样本太多了

运行代码输出:

[["*** Summary ***", "Job Name = test Date created: Mon Jan 14 15:48:33 2013"], ["*** Analysis Information", "Steady State is ON", "Turbulent Incompressible Flow is ON", "Static Temperature Equation is ON", "Mixed Convection is ON"], ["*** Field Variable Results Summary For Iteration 300", "Var Mean at Max at Min", "Vx Vel +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s", "Vy Vel +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s", "Vz Vel -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s", "Press -2.05980e+001 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2", "Temp +4.60000e+001 10965 +4.60000e+001 315867 +4.60000e+001 C", "TurbK +1.19616e+006 1319490 +1.44421e+008 10966 +1.81700e-008 mm^2/s^2", "TurbD +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3", "Scal1 +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000", "PTotl -5.91285e+000 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2", "EVisc +2.52037e-004 1320370 +1.14488e-002 2229 +0.00000e+000 g/mm-s", "ECond +1.05355e-002 1352833 +5.88890e-002 2229 +0.00000e+000 W/mm-K", "Dens +2.34793e-004 58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3", "Visc +1.62605e-005 10965 +1.81700e-005 2229 +0.00000e+000 g/mm-s", "Cond +2.50840e-002 2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K", "SpecH +1.01202e+000 38432 +1.81000e+000 10249 +1.00500e-003 J/g-K", "Emiss +8.94911e-001 10965 +1.00000e+000 2229 +0.00000e+000", "Transmiss +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000", "WRough +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000 mm", "SeeBeck +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000 V/K", "GenT +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s"]] 其思想是,当代码完成时,
hash
将以散列或数组散列的形式包含解析后的数据,以便您将其写出。我建议您考虑使用YAML,因为它可以很快将数据序列化为文件


我不打算再补充了,因为这个问题听起来很像一个家庭作业,再加上分析这些行也没那么难。将文件分块是一项更大的任务,这一部分已经为您完成。

感谢您的指导。。。我可以将节中的行添加到数组中,如何停止告诉它添加到数组中?行推送直到***?添加了一个示例,我使用
File#readlines
获取所有行,而不是使用循环。它也应该按你的方式工作。当您到达文件末尾时,您只需再次调用call_section_logic。非常有用,我需要阅读readlines vs.each do vs.while line.get。谢谢你。谢谢你的指导。。。我可以将节中的行添加到数组中,如何停止告诉它添加到数组中?行推送直到***?添加了一个示例,我使用
File#readlines
获取所有行,而不是使用循环。它也应该按你的方式工作。当您到达文件末尾时,您只需再次调用call_section_logic。非常有用,我需要阅读readlines vs.each do vs.while line.get。谢谢你,真漂亮!您已经介绍了一些新方法(map($:stip)、reject、slice_),我将继续阅读。这似乎是内存密集型(不是逐行);希望这不会引起任何问题。非常感谢。在你的问题中,你说“大”是指文件大小,但没有定义它的含义。“内存密集型”不再像过去那样令人担忧,因为大多数服务器都有大量的RAM。即使是我工作的开发虚拟机也有8GB可用空间,因此引入多GB文件并不像以前那样令人担忧。这就是说,您的任务可以不用散列来聚合所有内容,但它需要更多的代码,类似于SAXXML解析器的工作方式。这太棒了!您已经介绍了一些新方法(map($:stip)、reject、slice_),我将继续阅读。这似乎是记忆密集型