Ruby:哈希分配/从文本解析
希望从如下所示的文本输出创建哈希表(单词之间的空格是制表符): 我想创建一个哈希表,将以下各项分配给哈希:Ruby:哈希分配/从文本解析,ruby,string,hash,Ruby,String,Hash,希望从如下所示的文本输出创建哈希表(单词之间的空格是制表符): 我想创建一个哈希表,将以下各项分配给哈希: gcollector=Hash.new gcollector={“UniqueID”=>uniqueXXX, “基准”=>bar_XXXX.X, “FullUniqueID”=>fulluniqueXXX, “IssueGroup”=>已断开 } uniqueXXX字段始终与块和关联标签匹配 我有几个问题: 1-如何仅将这些字段分配给哈希? 2-如何拆分连字符之前的文本(在标签…问题中)并
gcollector=Hash.new
gcollector={“UniqueID”=>uniqueXXX,
“基准”=>bar_XXXX.X,
“FullUniqueID”=>fulluniqueXXX,
“IssueGroup”=>已断开
}
uniqueXXX字段始终与块和关联标签匹配
我有几个问题:1-如何仅将这些字段分配给哈希?
2-如何拆分连字符之前的文本(在标签…问题中)并将其分配给问题组?
3-当标签线的顺序不同时,如何可靠地进行此操作?
.. 当有多个日线或没有日线时,同样的问题 我会这样做:
records = [] # init an array to hold everything
gcollectors = {} # init the hash holding info for one record
# loop over the file
File.readlines('text.txt').each do |l|
# split the line into columns
columns = l.chomp.split("\t")
# if the first column is...
case columns[0]
when 'GCOLLECTOR'
# we don't care about the columns, but instead use this record to tell us to
# store the hash and reinitialize it.
if (gcollectors.any?)
records << gcollectors
gcollectors = {}
end
when 'BLOCK'
gcollectors['UniqueID'] = columns[1]
gcollectors['FullUniqueID'] = columns[3]
when 'LABEL'
# a LABEL record could have two different values we care about so figure out
# which it is.
case columns[3]
when 'Datum'
gcollectors['Datum'] = columns[4]
when 'Issue'
gcollectors['IssueGroup'] = columns[4].split('-').first.strip
end
end
# get the next record
next
end
require 'ap'
ap records
# >> [
# >> [0] {
# >> "UniqueID" => "unique111",
# >> "FullUniqueID" => "fullunique111",
# >> "IssueGroup" => "Broken",
# >> "Datum" => "bar_1666.9"
# >> },
# >> [1] {
# >> "UniqueID" => "unique222",
# >> "FullUniqueID" => "fullunique111",
# >> "IssueGroup" => "Broken",
# >> "Datum" => "bar_9921.2"
# >> }
# >> ]
records=[]初始化一个数组以保存所有内容
gcollector={}#初始化一条记录的哈希存储信息
#在文件上循环
File.readlines('text.txt')。每个do | l|
#将行拆分为列
列=l.chomp.split(“\t”)
#如果第一列是。。。
案例栏[0]
当“GCOLLECTOR”
#我们不关心列,而是使用此记录告诉我们
#存储哈希并重新初始化它。
如果(是否有gcollector?)
记录>[
# >> [0] {
#>>“UniqueID”=>“unique111”,
#>>“FullUniqueID”=>“fullunique111”,
#>>“IssueGroup”=>“已断开”,
#>>“基准”=>“bar_1666.9”
# >> },
# >> [1] {
#>>“UniqueID”=>“unique222”,
#>>“FullUniqueID”=>“fullunique111”,
#>>“IssueGroup”=>“已断开”,
#>>“基准面”=>“条形图9921.2”
# >> }
# >> ]
GCOLLECTOR=text.scan(/^GCOLLECTOR.+\n(?:(?:块|日|标签)。+\n?+/).map{收集器|
/^块\t(?\S+)\t\S+\t(?\S+)。+/=~收集器
/^LABEL\t\S++\t\S++\tDatum\t(?.+)/=~收集器
/^标签\t\S+\t\S+\tissure\t(?\S+/=~收集器
散列[
“独一无二的”,独一无二的,
“基准”,基准,
“FullUniqueID”,FullUniqueID,
“发布组”,发布
]
}
gcollector.each{i | pi}
{“UniqueID”=>“unique111”,“Datum”=>“bar_1666.9”,“FullUniqueID”=>“fullunique111”,“IssueGroup”=>“breaked”}
{“UniqueID”=>“unique222”,“Datum”=>“bar_9921.2”,“FullUniqueID”=>“fullunique111”,“IssueGroup”=>“breaked”}
{“UniqueID”=>“unique333”,“Datum”=>“bar_7766.2”,“FullUniqueID”=>“fullunique111”,“IssueGroup”=>“break”}这些年来,我多次需要这种能力。不幸的是,并非所有传入的数据都是对称的或标准/常量格式,因此我们必须找到方法来确定构成记录的块的开始或结束。确切地说,如果每个记录的顺序相同,我本可以找到它,但我更喜欢您的解决方案。您是否有一种快速的方法来显示记录的每个唯一值(即IssueGroup),然后显示计数?再次感谢您的帮助。如果您想跟踪其进度或累积计数,请在
when'GCOLLECTOR'
块中添加代码,然后再GCOLLECTOR={}
…或者,对于事后累积计数,只需查看记录数组中有多少元素,即records.size
records = [] # init an array to hold everything
gcollectors = {} # init the hash holding info for one record
# loop over the file
File.readlines('text.txt').each do |l|
# split the line into columns
columns = l.chomp.split("\t")
# if the first column is...
case columns[0]
when 'GCOLLECTOR'
# we don't care about the columns, but instead use this record to tell us to
# store the hash and reinitialize it.
if (gcollectors.any?)
records << gcollectors
gcollectors = {}
end
when 'BLOCK'
gcollectors['UniqueID'] = columns[1]
gcollectors['FullUniqueID'] = columns[3]
when 'LABEL'
# a LABEL record could have two different values we care about so figure out
# which it is.
case columns[3]
when 'Datum'
gcollectors['Datum'] = columns[4]
when 'Issue'
gcollectors['IssueGroup'] = columns[4].split('-').first.strip
end
end
# get the next record
next
end
require 'ap'
ap records
# >> [
# >> [0] {
# >> "UniqueID" => "unique111",
# >> "FullUniqueID" => "fullunique111",
# >> "IssueGroup" => "Broken",
# >> "Datum" => "bar_1666.9"
# >> },
# >> [1] {
# >> "UniqueID" => "unique222",
# >> "FullUniqueID" => "fullunique111",
# >> "IssueGroup" => "Broken",
# >> "Datum" => "bar_9921.2"
# >> }
# >> ]
gcollectors = text.scan(/^GCOLLECTOR.+\n(?:(?:BLOCK|DAY|LABEL).+\n?)+/).map { |collector|
/^BLOCK\t(?<uniqueid>\S+)\t\S+\t(?<fulluniqueid>\S+).+/ =~ collector
/^LABEL\t\S+\t\S+\tDatum\t(?<datum>.+)/ =~ collector
/^LABEL\t\S+\t\S+\tIssue\t(?<issue>\S+)/ =~ collector
Hash[
"UniqueID",uniqueid,
"Datum",datum,
"FullUniqueID",fulluniqueid,
"IssueGroup",issue
]
}
gcollectors.each{|i|p i}
{"UniqueID"=>"unique111", "Datum"=>"bar_1666.9", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"}
{"UniqueID"=>"unique222", "Datum"=>"bar_9921.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"}
{"UniqueID"=>"unique333", "Datum"=>"bar_7766.2", "FullUniqueID"=>"fullunique111", "IssueGroup"=>"Broken"}