如何从Ruby中的多个文件中删除重复的值或单词_Ruby

如何从Ruby中的多个文件中删除重复的值或单词

ruby

如何从Ruby中的多个文件中删除重复的值或单词,ruby,Ruby,我在一个目录中有多个文件，它们有重复的记录。同一个值可以有多个副本 sample.txt #My first file created on Tuesday [active] row = org\xcfte, org\hrxen speed = org\hrxen, org\kiopm7, org\0987hy col = org\lop09k, org\0987hy 示例2.txt #My second file created on Wednesday [active] row = org

我在一个目录中有多个文件，它们有重复的记录。同一个值可以有多个副本

sample.txt

#My first file created on Tuesday
[active]
row = org\xcfte, org\hrxen
speed = org\hrxen, org\kiopm7, org\0987hy
col = org\lop09k, org\0987hy

示例2.txt

#My second file created on Wednesday
[active]
row = org\xcfte, org\0okilh
speed = org\hrxen, org\0okilh
col =  org\0987hy

等等，用于多个文件

预期产量

sample:
org\xcfte
org\hrxen
org\kiopm7
org\0987hy
org\lop09k

sample_2:
org\xcfte
org\0okilh
org\hrxen
org\0987hy

第一个文件中的-org\hrxen和org\0987hy应仅显示一次，第二个文件中的“org\0okilh”应显示一次

我试过了

lines = File.read('/path/to/file')
lines.split("\n").uniq.join("\n")

以及

File.readlines("*.txt").uniq

但是重复条目不会被删除

我已经尝试了最后一个

File.readlines（“File.txt”）.uniq

，它对这种文件非常有效：

file.txt

org\xcfte
org\hrxen
org\hrxen
org\kiopm7
org\0987hy
org\0987hy
org\lop09k

这里的工作是正确解析文件，因为对这种文件执行

file.readlines（“file.txt”）.uniq

[active]
row = org\xcfte, org\0okilh
speed = org\hrxen, org\0okilh
col =  org\0987hy

…将永远不会打印您的预期输出

编辑这里有一个适合我的解决方案

array = []

file = File.read('file')
file.split(' ').each do |line|
    # I push the line if it is not already inside the array and if it contains the substring "org"
    formated = line.gsub(',', '')
    array.push(formated) unless array.include? formated or !line.include? "org"
end
puts array

样本1的输出：

org\xcfte
org\hrxen
org\kiopm7
org\0987hy
org\lop09k

问题是我所有的文件都是这种格式的。这就是我被卡住的地方，我不知道如何摆脱重复的字符串。简单而优雅，非常感谢！！请解释一下array.push（line.gsub（'，'，''）中的

array.push，除非array.include？行。gsub（'，'，''）或！包括吗？“org”

，因为这对像我这样的初学者很有帮助。再次感谢，在我分割文件之后，我得到了多行，例如“org\lop09k”。所以这一部分意味着我只想将这一行推入数组，如果它包含“org”，并且它不在这个数组中。我这样做，当然是用gsub函数去掉逗号。为了清楚的解释，我使用+1。只是想知道为什么

在之前！包括吗？“org”
是否有特定用途，为什么不line.include？“org”
是的，因为我使用除非而不是如果，所以我必须使用相反的表达式！包括吗？“org”
。字面意思是：推入数组，除非该行已经在数组中或者（！）
不包含子字符串“org”