如何在Ruby中完成这个复杂的搜索和替换操作?
我有一个大的文本文件。在这个文本文件中,我想用“菠菜”替换所有提到的单词“比萨饼”,用“菠菜”替换所有提到的单词“比萨饼”,用“旋转”替换所有提到的单词“比萨饼”-,除非这些单词出现在大括号内的任何地方。所以,{code>{pizza},{如何在Ruby中完成这个复杂的搜索和替换操作?,ruby,Ruby,我有一个大的文本文件。在这个文本文件中,我想用“菠菜”替换所有提到的单词“比萨饼”,用“菠菜”替换所有提到的单词“比萨饼”,用“旋转”替换所有提到的单词“比萨饼”-,除非这些单词出现在大括号内的任何地方。所以,{code>{pizza},{giant.pizza}和{hot pizza-coven}应该保持不变 到目前为止,我提出的最好的解决方案是逐行迭代文件,发出一个regex来检测{或}之前或之后的所有内容,并在每个字符串上使用regex。但是这会变得非常复杂和笨拙,我想知道是否有一个合适的
giant.pizza}
和{hot pizza-coven}
应该保持不变
到目前为止,我提出的最好的解决方案是逐行迭代文件,发出一个regex来检测{或}之前或之后的所有内容,并在每个字符串上使用regex。但是这会变得非常复杂和笨拙,我想知道是否有一个合适的解决方案来解决这个问题。这可以通过几个步骤来完成。我将逐行遍历文件,并将每一行传递给此方法:
def spinachize line
# list of words to swap
swaps = {
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'
}
# random placeholder for bracketed text
placeholder = 'fdjfafdlskdsfajkldfas'
# save all instances of bracketed text
bracketed_text = line.scan(/\{.*?\}/)
# remove bracketed text from line
line.gsub!(/\{.*?\}/, placeholder)
# replace all swaps
swaps.each do |original_text, new_text|
line.gsub!(original_text, new_text)
end
# re-insert bracketed text
line.gsub(placeholder){bracketed_text.shift}
end
上面的评论解释了我们前进的方向。以下是几个例子:
spinachize "Pizza is good, but more pizza is better"
=> "Spinach is good, but more spinach is better"
spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
=> "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容
过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新
方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。这可以通过几个步骤完成。我将逐行遍历文件,并将每一行传递给此方法:
def spinachize line
# list of words to swap
swaps = {
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'
}
# random placeholder for bracketed text
placeholder = 'fdjfafdlskdsfajkldfas'
# save all instances of bracketed text
bracketed_text = line.scan(/\{.*?\}/)
# remove bracketed text from line
line.gsub!(/\{.*?\}/, placeholder)
# replace all swaps
swaps.each do |original_text, new_text|
line.gsub!(original_text, new_text)
end
# re-insert bracketed text
line.gsub(placeholder){bracketed_text.shift}
end
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}' =>'{pizza}',
'{Pizza}' =>'{Pizza}',
'{pizzing}'=> '{pizzing}'
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
上面的评论解释了我们前进的方向。以下是几个例子:
spinachize "Pizza is good, but more pizza is better"
=> "Spinach is good, but more spinach is better"
spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
=> "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容
过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新
方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。这可以通过几个步骤完成。我将逐行遍历文件,并将每一行传递给此方法:
def spinachize line
# list of words to swap
swaps = {
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'
}
# random placeholder for bracketed text
placeholder = 'fdjfafdlskdsfajkldfas'
# save all instances of bracketed text
bracketed_text = line.scan(/\{.*?\}/)
# remove bracketed text from line
line.gsub!(/\{.*?\}/, placeholder)
# replace all swaps
swaps.each do |original_text, new_text|
line.gsub!(original_text, new_text)
end
# re-insert bracketed text
line.gsub(placeholder){bracketed_text.shift}
end
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}' =>'{pizza}',
'{Pizza}' =>'{Pizza}',
'{pizzing}'=> '{pizzing}'
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
上面的评论解释了我们前进的方向。以下是几个例子:
spinachize "Pizza is good, but more pizza is better"
=> "Spinach is good, but more spinach is better"
spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
=> "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容
过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新
方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。这可以通过几个步骤完成。我将逐行遍历文件,并将每一行传递给此方法:
def spinachize line
# list of words to swap
swaps = {
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'
}
# random placeholder for bracketed text
placeholder = 'fdjfafdlskdsfajkldfas'
# save all instances of bracketed text
bracketed_text = line.scan(/\{.*?\}/)
# remove bracketed text from line
line.gsub!(/\{.*?\}/, placeholder)
# replace all swaps
swaps.each do |original_text, new_text|
line.gsub!(original_text, new_text)
end
# re-insert bracketed text
line.gsub(placeholder){bracketed_text.shift}
end
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}' =>'{pizza}',
'{Pizza}' =>'{Pizza}',
'{pizzing}'=> '{pizzing}'
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
上面的评论解释了我们前进的方向。以下是几个例子:
spinachize "Pizza is good, but more pizza is better"
=> "Spinach is good, but more spinach is better"
spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
=> "Leave bracketed instances of {pizza} or {this.pizza} alone"
如您所见,您可以指定要交换的项,或者修改方法以从某个数据库或平面文件中提取列表。占位符只需要是源文件中自然不会出现的唯一内容
过程是这样的:从原始行中删除括号内的文本,并记住它以备以后使用。交换所有需要交换的文本,然后重新添加括号内的文本。它不是一行程序,但它工作良好,可读性强,易于更新
方法的最后一行可能需要一些澄清。没有多少人知道“gsub”方法可以采用一个块而不是第二个参数。然后,该块确定放置在原始文本位置的内容。在这种情况下,每次调用块时,我都会从保存的括号列表中删除第一项,并使用它。我会为文件的每一行调用以下方法
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}' =>'{pizza}',
'{Pizza}' =>'{Pizza}',
'{pizzing}'=> '{pizzing}'
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
代码
def doit(line)
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
arr= line.split(r).map { |str|
str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
line.scan(r).each_with_object(arr.shift) { |str,res|
res << str << arr.shift }
end
解释
line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
#=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
#=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
#=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
#=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
你提到了字符串
{words,salad,#{1,2,3} pizza|}
在评论中。如果这是用单引号括起来的字符串的一部分,这不是问题。但是,如果用双引号括起来,
#
将引发语法错误。同样,如果英镑字符被转义(\\\\\
),也没有问题。我将为文件的每一行调用以下方法
代码
def doit(line)
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
arr= line.split(r).map { |str|
str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
line.scan(r).each_with_object(arr.shift) { |str,res|
res << str << arr.shift }
end
解释
line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
#=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
#=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
#=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
#=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
你提到了字符串
{words,salad,#{1,2,3} pizza|}
在评论中。如果这是用单引号括起来的字符串的一部分,这不是问题。但是,如果用双引号括起来,
#
将引发语法错误。同样,如果英镑字符被转义(\\\\\
),也没有问题。我将为文件的每一行调用以下方法
代码
def doit(line)
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
arr= line.split(r).map { |str|
str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
line.scan(r).each_with_object(arr.shift) { |str,res|
res << str << arr.shift }
end
解释
line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
#=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
#=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
#=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
#=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
你提到了字符串
{words,salad,#{1,2,3} pizza|}
在评论中。如果这是用单引号括起来的字符串的一部分,这不是问题。但是,如果用双引号括起来,#