如何在Ruby中完成这个复杂的搜索和替换操作?

如何在Ruby中完成这个复杂的搜索和替换操作?,ruby,Ruby,我有一个大的文本文件。在这个文本文件中,我想用“菠菜”替换所有提到的单词“比萨饼”,用“菠菜”替换所有提到的单词“比萨饼”,用“旋转”替换所有提到的单词“比萨饼”-,除非这些单词出现在大括号内的任何地方。所以,{code>{pizza},{giant.pizza}和{hot pizza-coven}应该保持不变 到目前为止,我提出的最好的解决方案是逐行迭代文件,发出一个regex来检测{或}之前或之后的所有内容,并在每个字符串上使用regex。但是这会变得非常复杂和笨拙,我想知道是否有一个合适的

我有一个大的文本文件。在这个文本文件中,我想用“菠菜”替换所有提到的单词“比萨饼”,用“菠菜”替换所有提到的单词“比萨饼”,用“旋转”替换所有提到的单词“比萨饼”-,除非这些单词出现在大括号内的任何地方。所以,{code>{pizza},{
giant.pizza}
{hot pizza-coven}
应该保持不变


到目前为止,我提出的最好的解决方案是逐行迭代文件,发出一个regex来检测{或}之前或之后的所有内容,并在每个字符串上使用regex。但是这会变得非常复杂和笨拙,我想知道是否有一个合适的解决方案来解决这个问题。

这可以通过几个步骤来完成。我将逐行遍历文件,并将每一行传递给此方法:

def spinachize line
  # list of words to swap
  swaps = {
    'pizza' => 'spinach',
    'Pizza' => 'Spinach',
    'pizzing' => 'spinning'
  }

  # random placeholder for bracketed text
  placeholder = 'fdjfafdlskdsfajkldfas'

  # save all instances of bracketed text
  bracketed_text = line.scan(/\{.*?\}/)

  # remove bracketed text from line
  line.gsub!(/\{.*?\}/, placeholder)

  # replace all swaps
  swaps.each do |original_text, new_text|
    line.gsub!(original_text, new_text)
  end

  # re-insert bracketed text
  line.gsub(placeholder){bracketed_text.shift}
end
上面的评论解释了我们前进的方向。以下是几个例子:

spinachize "Pizza is good, but more pizza is better"
 => "Spinach is good, but more spinach is better"

spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
 => "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容

过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新


方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。

这可以通过几个步骤完成。我将逐行遍历文件,并将每一行传递给此方法:

def spinachize line
  # list of words to swap
  swaps = {
    'pizza' => 'spinach',
    'Pizza' => 'Spinach',
    'pizzing' => 'spinning'
  }

  # random placeholder for bracketed text
  placeholder = 'fdjfafdlskdsfajkldfas'

  # save all instances of bracketed text
  bracketed_text = line.scan(/\{.*?\}/)

  # remove bracketed text from line
  line.gsub!(/\{.*?\}/, placeholder)

  # replace all swaps
  swaps.each do |original_text, new_text|
    line.gsub!(original_text, new_text)
  end

  # re-insert bracketed text
  line.gsub(placeholder){bracketed_text.shift}
end
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}'  =>'{pizza}',
         '{Pizza}'  =>'{Pizza}',
         '{pizzing}'=> '{pizzing}'
         'pizza'    => 'spinach',
         'Pizza'    => 'Spinach',
         'pizzing'  => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
上面的评论解释了我们前进的方向。以下是几个例子:

spinachize "Pizza is good, but more pizza is better"
 => "Spinach is good, but more spinach is better"

spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
 => "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容

过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新


方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。

这可以通过几个步骤完成。我将逐行遍历文件,并将每一行传递给此方法:

def spinachize line
  # list of words to swap
  swaps = {
    'pizza' => 'spinach',
    'Pizza' => 'Spinach',
    'pizzing' => 'spinning'
  }

  # random placeholder for bracketed text
  placeholder = 'fdjfafdlskdsfajkldfas'

  # save all instances of bracketed text
  bracketed_text = line.scan(/\{.*?\}/)

  # remove bracketed text from line
  line.gsub!(/\{.*?\}/, placeholder)

  # replace all swaps
  swaps.each do |original_text, new_text|
    line.gsub!(original_text, new_text)
  end

  # re-insert bracketed text
  line.gsub(placeholder){bracketed_text.shift}
end
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}'  =>'{pizza}',
         '{Pizza}'  =>'{Pizza}',
         '{pizzing}'=> '{pizzing}'
         'pizza'    => 'spinach',
         'Pizza'    => 'Spinach',
         'pizzing'  => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
上面的评论解释了我们前进的方向。以下是几个例子:

spinachize "Pizza is good, but more pizza is better"
 => "Spinach is good, but more spinach is better"

spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
 => "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容

过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新


方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。

这可以通过几个步骤完成。我将逐行遍历文件,并将每一行传递给此方法:

def spinachize line
  # list of words to swap
  swaps = {
    'pizza' => 'spinach',
    'Pizza' => 'Spinach',
    'pizzing' => 'spinning'
  }

  # random placeholder for bracketed text
  placeholder = 'fdjfafdlskdsfajkldfas'

  # save all instances of bracketed text
  bracketed_text = line.scan(/\{.*?\}/)

  # remove bracketed text from line
  line.gsub!(/\{.*?\}/, placeholder)

  # replace all swaps
  swaps.each do |original_text, new_text|
    line.gsub!(original_text, new_text)
  end

  # re-insert bracketed text
  line.gsub(placeholder){bracketed_text.shift}
end
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}'  =>'{pizza}',
         '{Pizza}'  =>'{Pizza}',
         '{pizzing}'=> '{pizzing}'
         'pizza'    => 'spinach',
         'Pizza'    => 'Spinach',
         'pizzing'  => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
上面的评论解释了我们前进的方向。以下是几个例子:

spinachize "Pizza is good, but more pizza is better"
 => "Spinach is good, but more spinach is better"

spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
 => "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容

过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新


方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。

我会为文件的每一行调用以下方法

str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}'  =>'{pizza}',
         '{Pizza}'  =>'{Pizza}',
         '{pizzing}'=> '{pizzing}'
         'pizza'    => 'spinach',
         'Pizza'    => 'Spinach',
         'pizzing'  => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
代码

def doit(line)
  replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
  r = /\{.*?\}/
  arr= line.split(r).map { |str|
    str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
  line.scan(r).each_with_object(arr.shift) { |str,res|
    res << str << arr.shift }
end
解释

line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
  #=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
  #=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
  #=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
  #=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
你提到了字符串

{words,salad,#{1,2,3} pizza|}

在评论中。如果这是用单引号括起来的字符串的一部分,这不是问题。但是,如果用双引号括起来,
#
将引发语法错误。同样,如果英镑字符被转义(
\\\\\
),也没有问题。

我将为文件的每一行调用以下方法

代码

def doit(line)
  replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
  r = /\{.*?\}/
  arr= line.split(r).map { |str|
    str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
  line.scan(r).each_with_object(arr.shift) { |str,res|
    res << str << arr.shift }
end
解释

line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
  #=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
  #=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
  #=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
  #=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
你提到了字符串

{words,salad,#{1,2,3} pizza|}

在评论中。如果这是用单引号括起来的字符串的一部分,这不是问题。但是,如果用双引号括起来,
#
将引发语法错误。同样,如果英镑字符被转义(
\\\\\
),也没有问题。

我将为文件的每一行调用以下方法

代码

def doit(line)
  replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
  r = /\{.*?\}/
  arr= line.split(r).map { |str|
    str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
  line.scan(r).each_with_object(arr.shift) { |str,res|
    res << str << arr.shift }
end
解释

line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
  #=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
  #=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
  #=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
  #=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
你提到了字符串

{words,salad,#{1,2,3} pizza|}
在评论中。如果这是用单引号括起来的字符串的一部分,这不是问题。但是,如果用双引号括起来,
#