Ruby on rails Rails和prawn:使用正则表达式格式化
我正试图用正则表达式抓取HTML标记,用prawn的方法格式化。到目前为止,我已经做到了:Ruby on rails Rails和prawn:使用正则表达式格式化,ruby-on-rails,ruby,regex,ruby-on-rails-3,prawn,Ruby On Rails,Ruby,Regex,Ruby On Rails 3,Prawn,我正试图用正则表达式抓取HTML标记,用prawn的方法格式化。到目前为止,我已经做到了: def format(pdf, string) regex_string = "\n|" + "<b>(.*)<\/b>|" + "<h3>(.*)<\/h3>|" + "<strong>(.*)<\/strong>|" +
def format(pdf, string)
regex_string = "\n|" +
"<b>(.*)<\/b>|" +
"<h3>(.*)<\/h3>|" +
"<strong>(.*)<\/strong>|" +
"[^<\n]+"
regex = Regexp.new(regex_string, Regexp::MULTILINE)
tokens = string.gsub("\n", " ").gsub(/<br\s*\/?>/, "\n").scan(regex)
tokens.each do |breaker|
case breaker
when "<b>(.*)<\/b>", "<strong>(.*)<\/strong>"
pdf.text breaker.to_s, :size => 10
when "<h3>(.*)<\/h3>"
pdf.text breaker.to_s, :size => 16
else
pdf.text breaker.to_s, :size => 10, :leading => 1
end
end
end
def格式(pdf,字符串)
regex|u string=“\n|”+
"(.*)|" +
"(.*)|" +
“(.*)|”+
“[^如果不亲自测试,它可能在空数组元素上运行。请尝试以下操作:
tokens = string.gsub("\n", " ").gsub(/<br\s*\/?>/, "\n").scan(regex).compact
tokens=string.gsub(“\n”,”).gsub(//,“\n”).scan(regex).compact
对于一般的HTML解析,Nokogiri可能是最好的选择(对于Rails):