Ruby 将复杂文件拆分为哈希
我正在运行一个名为Primer 3的命令行程序。它接受一个输入文件并将数据返回到标准输出。我正在尝试编写一个Ruby脚本,它将接受该输入,并将条目放入散列 返回的结果如下所示。我想拆分“=”号上的数据,以便has的内容如下:Ruby 将复杂文件拆分为哈希,ruby,Ruby,我正在运行一个名为Primer 3的命令行程序。它接受一个输入文件并将数据返回到标准输出。我正在尝试编写一个Ruby脚本,它将接受该输入,并将条目放入散列 返回的结果如下所示。我想拆分“=”号上的数据,以便has的内容如下: {:SEQUENCE_ID => "example", :SEQUENCE_TEMPLATE => "GTAGTCAGTAGACNAT..etc", :SEQUENCE_TARGET => "37,21" etc } 我还想将键的大小写降低,即: {:
{:SEQUENCE_ID => "example", :SEQUENCE_TEMPLATE => "GTAGTCAGTAGACNAT..etc", :SEQUENCE_TARGET => "37,21" etc }
我还想将键的大小写降低,即:
{:sequence_id => "example", :sequence_template => "GTAGTCAGTAGACNAT..etc", :sequence_target => "37,21" etc }
这是我当前的脚本:
#!/usr/bin/ruby
puts 'Primer 3 hash'
primer3 = {}
while line = gets do
name, height = line.split(/\=/)
primer3[name] = height.to_i
end
puts primer3
它返回的是:
Primer 3 hash
{"SEQUENCE_ID"=>0, "SEQUENCE_TEMPLATE"=>0, "SEQUENCE_TARGET"=>37, "PRIMER_TASK"=>0, "PRIMER_PICK_LEFT_PRIMER"=>1, "PRIMER_PICK_INTERNAL_OLIGO"=>1, "PRIMER_PICK_RIGHT_PRIMER"=>1, "PRIMER_OPT_SIZE"=>18, "PRIMER_MIN_SIZE"=>15, "PRIMER_MAX_SIZE"=>21, "PRIMER_MAX_NS_ACCEPTED"=>1, "PRIMER_PRODUCT_SIZE_RANGE"=>75, "P3_FILE_FLAG"=>1, "SEQUENCE_INTERNAL_EXCLUDED_REGION"=>37, "PRIMER_EXPLAIN_FLAG"=>1, "PRIMER_THERMODYNAMIC_PARAMETERS_PATH"=>0, "PRIMER_LEFT_EXPLAIN"=>0, "PRIMER_RIGHT_EXPLAIN"=>0, "PRIMER_INTERNAL_EXPLAIN"=>0, "PRIMER_PAIR_EXPLAIN"=>0, "PRIMER_LEFT_NUM_RETURNED"=>0, "PRIMER_RIGHT_NUM_RETURNED"=>0, "PRIMER_INTERNAL_NUM_RETURNED"=>0, "PRIMER_PAIR_NUM_RETURNED"=>0, ""=>0}
数据源
SEQUENCE\u ID=示例
序列\模板=GTAGTCAGTAGACNATGACNACTGACGATGACNACACACACACACACACACACACACAGGTATTAGGGCATCGATCCCGACACACACAAATCGATCGATCGATCGATACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAGAGAGATGC
序列\目标=37,21
引物任务=拾取检测引物
底漆\拾取\左侧\底漆=1
底漆\拾取\内部\低聚物=1
底漆\u选择\u右侧\u底漆=1
底漆选择尺寸=18
底漆最小尺寸=15
底漆最大尺寸=21
底漆\u最大\u NS\u已接受=1
底漆\产品\尺寸\范围=75-100
P3_文件_标志=1
序列\内部\排除\区域=37,21
底漆\u解释\u标志=1
PRIMER\u热力学参数\u路径=/usr/local/ceral/primer3/2.3.4/bin/primer3\u配置/
底漆_左_解释=65,过多Ns 17,低tm 48,正常0
底漆\右\解释=考虑228,低tm 159,高tm 12,高发夹稳定性22,正常35
PRIMER\u INTERNAL\u EXPLAIN=考虑为0,正常为0
底漆\u对\u解释=考虑为0,正常为0
PRIMER\u LEFT\u NUM\u返回值=0
PRIMER\u RIGHT\u NUM\u返回值=0
底漆\u内部\u数量\u返回=0
返回的引物对数=0
=
$primer3_core
好的,我已经拿到了(差不多了)。唯一的问题是在每个值的末尾添加一个\n
puts 'Primer 3 hash'
primer3 = {}
while line = gets do
key, value = line.split(/\=/)
puts key
puts value
primer3[key.downcase] = value
end
puts primer3
{"sequence_id"=>"example\n", "sequence_template"=>"GTAGTCAGTAGACNATGACNACTGACGATGCAGACNACACACACACACACAGCACACAGGTATTAGTGGGCCATTCGATCCCGACCCAAATCGATAGCTACGATGACG\n", "sequence_target"=>"37,21\n", "primer_task"=>"pick_detection_primers\n", "primer_pick_left_primer"=>"1\n", "primer_pick_internal_oligo"=>"1\n", "primer_pick_right_primer"=>"1\n", "primer_opt_size"=>"18\n", "primer_min_size"=>"15\n", "primer_max_size"=>"21\n", "primer_max_ns_accepted"=>"1\n", "primer_product_size_range"=>"75-100\n", "p3_file_flag"=>"1\n", "sequence_internal_excluded_region"=>"37,21\n", "primer_explain_flag"=>"1\n", "primer_thermodynamic_parameters_path"=>"/usr/local/Cellar/primer3/2.3.4/bin/primer3_config/\n", "primer_left_explain"=>"considered 65, too many Ns 17, low tm 48, ok 0\n", "primer_right_explain"=>"considered 228, low tm 159, high tm 12, high hairpin stability 22, ok 35\n", "primer_internal_explain"=>"considered 0, ok 0\n", "primer_pair_explain"=>"considered 0, ok 0\n", "primer_left_num_returned"=>"0\n", "primer_right_num_returned"=>"0\n", "primer_internal_num_returned"=>"0\n", "primer_pair_num_returned"=>"0\n", ""=>"\n"}
#/usr/bin/ruby
放置“Primer 3哈希”
primer3={}
当line=开始工作时
键,值=行。拆分(/=/,2)
primer3[key.downcase.to_sym]=value.chomp
结束
放置素数3
为了好玩,这里有两个纯功能解决方案。两者都假设您已经从文件中提取了数据,例如
my_data = ARGF.read # read the file passed on the command line
这条感觉有点恶心,但它是一条(长的)单行线:)
这是两行,但感觉比使用带有索引的更干净:
keys,values = my_data.lines.map{ |line| line.chomp.split('=',2) }.transpose
hash = Hash[ keys.map(&:downcase).map(&:to_sym).zip(values) ]
这两个答案的效率可能都比你已经接受的答案低,而且肯定比你已经接受的答案更让人记忆犹新;迭代行并缓慢地改变散列是最好的方法。这些非突变变异只是一种心理训练
您的最终答案应该使用ARGF
在命令行或通过STDIN允许文件名。我想这样写:
#!/usr/bin/ruby
module Primer3
def self.parse( file )
{}.tap do |primer3|
# Process one line at a time, without reading it all into memory first
file.each_line do |line|
key, value = line.chomp.split('=', 2)
primer3[key.downcase.to_sym] = value
end
end
end
end
Primer3.parse( ARGF ) if __FILE__==$0
通过这种方式,您可以从命令行调用该文件(带或不带STDIN),也可以要求该文件并使用它在其他代码中定义的模块函数。将.chomp
添加到值以删除换行符。@SeanGeneva我没有投反对票,但与问题标准不完全匹配的答案通常会被投反对票。例如,这个答案使用获取
而不是文件,并使用字符串而不是键作为符号。是的,这太完美了!谢谢。一个问题:2在这里做什么:line.split(/=/,2)@SeanGeneva限制从split
中获取的字段,以防值部分包含=
。哇,太棒了。干杯。一个结构合理的完整问题+1,包括你已经尝试过的。精彩的答案。非常感谢你!
#!/usr/bin/ruby
module Primer3
def self.parse( file )
{}.tap do |primer3|
# Process one line at a time, without reading it all into memory first
file.each_line do |line|
key, value = line.chomp.split('=', 2)
primer3[key.downcase.to_sym] = value
end
end
end
end
Primer3.parse( ARGF ) if __FILE__==$0