无法使用Ruby Regex Rubular正确分割数据

无法使用Ruby Regex Rubular正确分割数据,ruby,regex,rubular,Ruby,Regex,Rubular,我正在尝试组织和分解通过Net::POP3提取的电子邮件中的内容。在代码中,当我使用 p mail.pop 我明白了 到目前为止,我一直在使用rubular,但由于我仍在学习如何正确使用regex、gsub和split,所以结果各不相同。我的代码如下 p mail.pop.scan(/Summary: (.+) Name:/) p mail.pop.scan(/Name: (.+) Category:/) p mail.pop.scan(/Category: (.+) Email:

我正在尝试组织和分解通过Net::POP3提取的电子邮件中的内容。在代码中,当我使用

p mail.pop
我明白了

到目前为止,我一直在使用rubular,但由于我仍在学习如何正确使用regex、gsub和split,所以结果各不相同。我的代码如下

  p mail.pop.scan(/Summary: (.+) Name:/)
  p mail.pop.scan(/Name: (.+) Category:/)
  p mail.pop.scan(/Category: (.+) Email:/) 
  p mail.pop.scan(/Email: (.+) Journal News:/)     
  p mail.pop.scan(/Journal News: (.+) Deadline:/)       
  p mail.pop.scan(/Deadline: (.+) Questions:/)    
  p mail.pop.scan(/Questions:(.+) Requirements:/) 
  p mail.pop.scan(/Requirements:(.+) Back to Top/)  
但我得到的是空数组

[]
[]
[]
[]
[]
[]
[]
[]
想知道我怎么才能做得更好。提前谢谢。

哦,天哪!真是一团糟

当然,有很多方法可以解决这个问题,但我希望它们都涉及多个步骤和大量的尝试和错误。我只能说我是怎么做的

很多小步骤是一件好事,有几个原因。首先,它将问题分解为可管理的任务,这些任务的解决方案可以单独测试。其次,解析规则将来可能会发生变化。如果有多个步骤,则可能只需更改和/或添加一个或两个操作。如果步骤少,正则表达式复杂,最好从头开始,特别是如果代码是由其他人编写的

假设
text
是一个包含字符串的变量

首先,我不喜欢所有这些新词,因为它们使正则表达式复杂化,所以我要做的第一件事就是去掉它们:

s1 = text.gsub(/\n/, '')
接下来,有许多
“20\r”
,可能会很麻烦,因为我们可能希望保留包含数字的其他文本,因此我们可以删除这些文本(以及
“7941\r”
):

现在,让我们看一下所需的字段以及紧接前面和后面的文本:

puts s2.scan(/.{4}(?:\w+\s+)*\w+:.{15}/)
  # <> Summary: Working with V
  #=>> Name: Megumi Lindon 
  #=>> Category: Social Psychol
  #=>> Email: information@ex
  #<mailto:information@exa
  #=>> Journal News: Saving Grace 
  #=>> Deadline: 10:00 PM EST -
  #=>> Query:=>>=>> Lorem ip
  #=>> Requirements:=>>=>> Psycholo
  # <x-msg://30/#top> Back
  #<x-msg://30/#SocialPsy

s3
的正则表达式中,
(?可能有更好的方法,但对于初学者来说,可以像这样将
/m
添加到扫描中:
str.scan(/Summary:(.+)Name:/m)
嘿,谢谢你花时间提供如此详细的答案。这确实帮助我更好地理解regex。虽然我没有使用你的解决方案(因为我能够使内容不那么凌乱),但通过你的指南,我能够自己设计。再次感谢。=)我很乐意提供帮助。帮助别人学习新东西总是比解决他们提出的特定问题更令人满意。
s1 = text.gsub(/\n/, '')
s2 = s1.gsub(/\d+\r/, '') 
puts s2.scan(/.{4}(?:\w+\s+)*\w+:.{15}/)
  # <> Summary: Working with V
  #=>> Name: Megumi Lindon 
  #=>> Category: Social Psychol
  #=>> Email: information@ex
  #<mailto:information@exa
  #=>> Journal News: Saving Grace 
  #=>> Deadline: 10:00 PM EST -
  #=>> Query:=>>=>> Lorem ip
  #=>> Requirements:=>>=>> Psycholo
  # <x-msg://30/#top> Back
  #<x-msg://30/#SocialPsy
s3 = s2.gsub(/(?<=\w):=/, ": ")
s4 = s3.gsub(/>\s+(?=(?:\w+\s+)*\w+: )/, " :")
s5 = s4.gsub(/[^a-zA-Z0-9 :;.?!-()\[\]{}]/, "")
a1 = s5.split(/((?<= :)(?:\w+\s+)*\w+:\s+)/)
  # => ["11)  :", "Summary: ", "Working with Vars on Social Influence platform :",
  #     "Name: ", "Megumi Lindon  :",
  #     "Category: ", "Social Psychology :",
  #     "Email: ", "informationexample.com mailto:informationexample.com :",
  #     "Journal News: ", "Saving Grace  :",
  #     "Deadline: ", "10:00 PM EST  15 February :",
  #     "Query:  ", "Lorem ipsum ...laborum. :",
  #     "Requirements:  ", "Psychologists; anyone...psychology...Top xmsg:30#top...Psychology"] 
a2 = a1.map { |s| s.chomp(':') }
a2[0] = a2.shift + a2.first
  #=> "11)  Summary: "
a3 = a2.each_slice(2).to_a
  #=> [["11)  Summary: ", "Working with Vars on Social Influence platform "],
  #    ["Name: ", "Megumi Lindon  "],
  #    ["Category: ", "Social Psychology "],
  #    ["Email: ", "informationexample.com mailto:informationexample.com "],
  #    ["Journal News: ", "Saving Grace  "],
  #    ["Deadline: ", "10:00 PM EST  15 February "],
  #    ["Query:  ", "Lorem...est laborum. "],
  #    ["Requirements:  ", "Psychologists;...psychology. Please...xmsg:30#SocialPsychology"]] 

idx = a3.index { |n,_| n =~ /Email: / }
  #=> 3 
a3[idx][1] = a3[idx][1][/.*?\s/] if idx
  #=> "informationexample.com " 
a4 = a3.map { |b| b.join(' ').split.join(' ') }
  #=> ["11) Summary: Working with Vars on Social Influence platform",
  #    "Name: Megumi Lindon",
  #    "Category: Social Psychology",
  #    "Email: informationexample.com",
  #    "Journal News: Saving Grace",
  #    "Deadline: 10:00 PM EST 15 February",
  #    "Query: Lorem...laborum.",
  #    "Requirements: Psychologists...psychology. Please...well. Thank...Psychology"] 
idx = a4.index { |n,_| n =~ /Requirements: / }
  #=> 7
a4[idx] = a4[idx][/.*?[.!?]/] if idx
  # => "Requirements: Psychologists; anyone with good knowsledge with sociology and psychology."
def parse_it(text)
  a1 = text.gsub(/\n/, '')
           .gsub(/\d+\r/, '') 
           .gsub(/(?<=\w):=/, ": ")
           .gsub(/>\s+(?=(?:\w+\s+)*\w+: )/, " :")
           .gsub(/[^a-zA-Z0-9 :;.?!-()\[\]{}]/, "")
           .split(/((?<= :)(?:\w+\s+)*\w+:\s+)/)
           .map { |s| s.chomp(':') }

  a1[0] = a1.shift + a1.first

  a2 = a1.each_slice(2).to_a
  idx = a2.index { |n,_| n =~ /Email: / }
  a2[idx][1] = a2[idx][1][/.*?\s/] if idx

  a3 = a2.map { |b| b.join(' ').split.join(' ') }    
  idx = a3.index { |n,_| n =~ /Requirements: / }
  a3[idx] = a3[idx][/.*?[.!?]/] if idx

  a3
end