Ruby:哈希分配/从文本解析

Ruby:哈希分配/从文本解析,ruby,string,hash,Ruby,String,Hash,希望从如下所示的文本输出创建哈希表(单词之间的空格是制表符): 我想创建一个哈希表,将以下各项分配给哈希: gcollector=Hash.new gcollector={“UniqueID”=>uniqueXXX, “基准”=>bar_XXXX.X, “FullUniqueID”=>fulluniqueXXX, “IssueGroup”=>已断开 } uniqueXXX字段始终与块和关联标签匹配 我有几个问题: 1-如何仅将这些字段分配给哈希? 2-如何拆分连字符之前的文本(在标签…问题中)并

希望从如下所示的文本输出创建哈希表(单词之间的空格是制表符):

我想创建一个哈希表,将以下各项分配给哈希:
gcollector=Hash.new

gcollector={“UniqueID”=>uniqueXXX,
“基准”=>bar_XXXX.X,
“FullUniqueID”=>fulluniqueXXX,
“IssueGroup”=>已断开
}

uniqueXXX字段始终与块和关联标签匹配

我有几个问题:
1-如何仅将这些字段分配给哈希?
2-如何拆分连字符之前的文本(在标签…问题中)并将其分配给问题组?
3-当标签线的顺序不同时,如何可靠地进行此操作?

.. 当有多个日线或没有日线时,同样的问题

我会这样做:

records     = [] # init an array to hold everything
gcollectors = {} # init the hash holding info for one record

# loop over the file
File.readlines('text.txt').each do |l|

  # split the line into columns
  columns = l.chomp.split("\t")

  # if the first column is...
  case columns[0]
  when 'GCOLLECTOR'
    # we don't care about the columns, but instead use this record to tell us to
    # store the hash and reinitialize it.
    if (gcollectors.any?)
      records << gcollectors
      gcollectors = {}
    end
  when 'BLOCK'
    gcollectors['UniqueID']     = columns[1]
    gcollectors['FullUniqueID'] = columns[3]
  when 'LABEL'
    # a LABEL record could have two different values we care about so figure out
    # which it is.
    case columns[3]
    when 'Datum'
      gcollectors['Datum'] = columns[4]
    when 'Issue'
      gcollectors['IssueGroup'] = columns[4].split('-').first.strip
    end
  end

  # get the next record
  next
end

require 'ap'
ap records
# >> [
# >>     [0] {
# >>             "UniqueID" => "unique111",
# >>         "FullUniqueID" => "fullunique111",
# >>           "IssueGroup" => "Broken",
# >>                "Datum" => "bar_1666.9"
# >>     },
# >>     [1] {
# >>             "UniqueID" => "unique222",
# >>         "FullUniqueID" => "fullunique111",
# >>           "IssueGroup" => "Broken",
# >>                "Datum" => "bar_9921.2"
# >>     }
# >> ]
records=[]初始化一个数组以保存所有内容
gcollector={}#初始化一条记录的哈希存储信息
#在文件上循环
File.readlines('text.txt')。每个do | l|
#将行拆分为列
列=l.chomp.split(“\t”)
#如果第一列是。。。
案例栏[0]
当“GCOLLECTOR”
#我们不关心列,而是使用此记录告诉我们
#存储哈希并重新初始化它。
如果(是否有gcollector?)
记录>[
# >>     [0] {
#>>“UniqueID”=>“unique111”,
#>>“FullUniqueID”=>“fullunique111”,
#>>“IssueGroup”=>“已断开”,
#>>“基准”=>“bar_1666.9”
# >>     },
# >>     [1] {
#>>“UniqueID”=>“unique222”,
#>>“FullUniqueID”=>“fullunique111”,
#>>“IssueGroup”=>“已断开”,
#>>“基准面”=>“条形图9921.2”
# >>     }
# >> ]
GCOLLECTOR=text.scan(/^GCOLLECTOR.+\n(?:(?:块|日|标签)。+\n?+/).map{收集器|
/^块\t(?\S+)\t\S+\t(?\S+)。+/=~收集器
/^LABEL\t\S++\t\S++\tDatum\t(?.+)/=~收集器
/^标签\t\S+\t\S+\tissure\t(?\S+/=~收集器
散列[
“独一无二的”,独一无二的,
“基准”,基准,
“FullUniqueID”,FullUniqueID,
“发布组”,发布
]
}
gcollector.each{i | pi}
{“UniqueID”=>“unique111”,“Datum”=>“bar_1666.9”,“FullUniqueID”=>“fullunique111”,“IssueGroup”=>“breaked”} {“UniqueID”=>“unique222”,“Datum”=>“bar_9921.2”,“FullUniqueID”=>“fullunique111”,“IssueGroup”=>“breaked”}
{“UniqueID”=>“unique333”,“Datum”=>“bar_7766.2”,“FullUniqueID”=>“fullunique111”,“IssueGroup”=>“break”}这些年来,我多次需要这种能力。不幸的是,并非所有传入的数据都是对称的或标准/常量格式,因此我们必须找到方法来确定构成记录的块的开始或结束。确切地说,如果每个记录的顺序相同,我本可以找到它,但我更喜欢您的解决方案。您是否有一种快速的方法来显示记录的每个唯一值(即IssueGroup),然后显示计数?再次感谢您的帮助。如果您想跟踪其进度或累积计数,请在
when'GCOLLECTOR'
块中添加代码,然后再
GCOLLECTOR={}
…或者,对于事后累积计数,只需查看记录数组中有多少元素,即
records.size
records     = [] # init an array to hold everything
gcollectors = {} # init the hash holding info for one record

# loop over the file
File.readlines('text.txt').each do |l|

  # split the line into columns
  columns = l.chomp.split("\t")

  # if the first column is...
  case columns[0]
  when 'GCOLLECTOR'
    # we don't care about the columns, but instead use this record to tell us to
    # store the hash and reinitialize it.
    if (gcollectors.any?)
      records << gcollectors
      gcollectors = {}
    end
  when 'BLOCK'
    gcollectors['UniqueID']     = columns[1]
    gcollectors['FullUniqueID'] = columns[3]
  when 'LABEL'
    # a LABEL record could have two different values we care about so figure out
    # which it is.
    case columns[3]
    when 'Datum'
      gcollectors['Datum'] = columns[4]
    when 'Issue'
      gcollectors['IssueGroup'] = columns[4].split('-').first.strip
    end
  end

  # get the next record
  next
end

require 'ap'
ap records
# >> [
# >>     [0] {
# >>             "UniqueID" => "unique111",
# >>         "FullUniqueID" => "fullunique111",
# >>           "IssueGroup" => "Broken",
# >>                "Datum" => "bar_1666.9"
# >>     },
# >>     [1] {
# >>             "UniqueID" => "unique222",
# >>         "FullUniqueID" => "fullunique111",
# >>           "IssueGroup" => "Broken",
# >>                "Datum" => "bar_9921.2"
# >>     }
# >> ]
gcollectors = text.scan(/^GCOLLECTOR.+\n(?:(?:BLOCK|DAY|LABEL).+\n?)+/).map { |collector|
    /^BLOCK\t(?<uniqueid>\S+)\t\S+\t(?<fulluniqueid>\S+).+/ =~ collector
    /^LABEL\t\S+\t\S+\tDatum\t(?<datum>.+)/ =~ collector
    /^LABEL\t\S+\t\S+\tIssue\t(?<issue>\S+)/ =~ collector
    Hash[
        "UniqueID",uniqueid,
        "Datum",datum,
        "FullUniqueID",fulluniqueid,
        "IssueGroup",issue
    ]
}

gcollectors.each{|i|p i}
{"UniqueID"=>"unique111", "Datum"=>"bar_1666.9", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"} {"UniqueID"=>"unique222", "Datum"=>"bar_9921.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"} {"UniqueID"=>"unique333", "Datum"=>"bar_7766.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"}