Ruby 更好的解析方法"；说明（标签）"；至；说明，标签“1”；_Ruby

Ruby 更好的解析方法"；说明（标签）"；至；说明，标签“1”；

ruby

Ruby 更好的解析方法"；说明（标签）"；至；说明，标签“1”；,ruby,Ruby,我有一个文本文件，有1000多行这样的内容，它们是类别描述，关键字用括号括起来 Chemicals (chem) Electrical (elec) 我需要将这些行转换为逗号分隔的值，如下所示： Chemicals, chem Electrical, elec 我使用的是： lines = line.gsub!('(', ',').gsub!(')', '').split(',') 我想知道是否有更好的方法来做到这一点对于后代，这是完整的代码（基于答案）需要“rubygems” 需

我有一个文本文件，有1000多行这样的内容，它们是类别描述，关键字用括号括起来

Chemicals (chem) 
Electrical (elec)

我需要将这些行转换为逗号分隔的值，如下所示：

Chemicals, chem
Electrical, elec

我使用的是：

lines = line.gsub!('(', ',').gsub!(')', '').split(',')

我想知道是否有更好的方法来做到这一点

对于后代，这是完整的代码（基于答案）

需要“rubygems”
需要“csv”
csvfile=CSV.open（'output.CSV'，'w'）
File.open（'c:/categories.txt'）do | f|
f、 readlines.each do|行|
（描述，类别）=行拆分（'（'）
描述带！
脱衣猫！
csvfile尝试以下方法：
line.sub!(/ \((\w+)\)$/, ', \1')

\1
将替换为给定regexp的第一个匹配项（在这种情况下，它将始终是category关键字）。因此它将基本上用，chem
更改（chem）

让我们使用文本文件创建一个示例：
lines = []
File.open('categories.txt', 'r') do |file|
  while line = file.gets 
    lines << line.sub(/ \((\w+)\)$/, ', \1')
  end
end

行=[]
File.open（'categories.txt'，r'）do |文件|
而line=file.get
行尝试以下方法：
line.sub!(/ \((\w+)\)$/, ', \1')

\1
将替换为给定regexp的第一个匹配项（在这种情况下，它将始终是category关键字）。因此它将基本上用，chem
更改（chem）

让我们使用文本文件创建一个示例：
lines = []
File.open('categories.txt', 'r') do |file|
  while line = file.gets 
    lines << line.sub(/ \((\w+)\)$/, ', \1')
  end
end

行=[]
File.open（'categories.txt'，r'）do |文件|
而line=file.get
lines对ruby一无所知，但它在php中很容易实现
 preg_match_all('~(.+)\((.+)\)~','Chemicals (chem)',$m);

$result = $m[1].','.$m[2];

对ruby一无所知，但它在php中很简单
 preg_match_all('~(.+)\((.+)\)~','Chemicals (chem)',$m);

$result = $m[1].','.$m[2];

从Ruby 1.9开始，您可以通过一个方法调用完成：
str = "Chemicals (chem)\n"
mapping = { ' (' => ', ',
            ')'  => ''}

str.gsub(/ \(|\)/, mapping)  #=> "Chemicals, chem\n"

从Ruby 1.9开始，您可以通过一个方法调用完成：
str = "Chemicals (chem)\n"
mapping = { ' (' => ', ',
            ')'  => ''}

str.gsub(/ \(|\)/, mapping)  #=> "Chemicals, chem\n"

在Ruby中，一种更干净、更高效的方法是：
description, tag = line.split(' ', 2) # split(' ', 2) will return an 2 element array of
                                      # the all characters up to the first space and all characters after. We can then use
                                      # multi assignment syntax to assign each array element in a different local variable
tag = tag[1, (tag.length - 1) - 1] # extract the inside characters (not first or last) of the string
new_line = description << ", " << tag # rejoin the parts into a new string

description，tag=line.split（“”，2）#split（“”，2）将返回
#第一个空格前的所有字符和第二个空格后的所有字符。然后我们可以使用
#多重赋值语法，用于在不同的局部变量中分配每个数组元素
tag=tag[1，（tag.length-1）-1]#提取字符串的内部字符（不是第一个或最后一个）
在Ruby中，一种更干净、更高效的方法是：
description, tag = line.split(' ', 2) # split(' ', 2) will return an 2 element array of
                                      # the all characters up to the first space and all characters after. We can then use
                                      # multi assignment syntax to assign each array element in a different local variable
tag = tag[1, (tag.length - 1) - 1] # extract the inside characters (not first or last) of the string
new_line = description << ", " << tag # rejoin the parts into a new string

description，tag=line.split（“”，2）#split（“”，2）将返回
#第一个空格前的所有字符和第二个空格后的所有字符。然后我们可以使用
#多重赋值语法，用于在不同的局部变量中分配每个数组元素
tag=tag[1，（tag.length-1）-1]#提取字符串的内部字符（不是第一个或最后一个）
new_line=description无需操作字符串。只需获取数据并将其输出到CSV文件。
假设数据中有类似的内容：
化学品（化学）
电气（电气）
染料和中间体（染料）
这应该起作用：
File.open('categories.txt', 'r') do |file|
  file.each_line do |line|
    csvfile << line.match(/^(.+)\s\((.+)\)$/) { |m| [m[1], m[2]] }
  end
end

File.open（'categories.txt'，r'）do | File|
file.u每行do |行|
csvfile无需操作字符串。只需获取数据并将其输出到CSV文件。
假设数据中有类似的内容：
化学品（化学）
电气（电气）
染料和中间体（染料）
这应该起作用：
File.open('categories.txt', 'r') do |file|
  file.each_line do |line|
    csvfile << line.match(/^(.+)\s\((.+)\)$/) { |m| [m[1], m[2]] }
  end
end

File.open（'categories.txt'，r'）do | File|
file.u每行do |行|
csvfile与@hundredwatt答案中讨论相关的基准：
require 'benchmark'

line = "Chemicals (chem)"

# @hundredwatt
puts Benchmark.measure {
  100000.times do
    description, tag = line.split(' ', 2)
    tag = tag[1, (tag.length - 1) - 1]
    new_line = description << ", " << tag
  end
} # => 0.18

# NeX
puts Benchmark.measure {
  100000.times do
    line.sub!(/ \((\w+)\)$/, ', \1')
  end
} # => 0.08

# steenslag
mapping = { ' (' => ', ',
  ')'  => ''}
puts Benchmark.measure {
  100000.times do
    line.gsub(/ \(|\)/, mapping)
  end
} # => 0.08

需要“基准测试”
line=“化学品（化学）”
#@hundredwatt
衡量{
10万倍
描述，标记=行。拆分（“”，2）
tag=tag[1，（tag.length-1）-1]
新线=说明0.08
#钢条
映射={'（'=>'，'，
')'  => ''}
衡量{
10万倍
line.gsub（/\（|\）/，映射）
结束
} # => 0.08
与@hundredwatt答案中讨论相关的基准：
require 'benchmark'

line = "Chemicals (chem)"

# @hundredwatt
puts Benchmark.measure {
  100000.times do
    description, tag = line.split(' ', 2)
    tag = tag[1, (tag.length - 1) - 1]
    new_line = description << ", " << tag
  end
} # => 0.18

# NeX
puts Benchmark.measure {
  100000.times do
    line.sub!(/ \((\w+)\)$/, ', \1')
  end
} # => 0.08

# steenslag
mapping = { ' (' => ', ',
  ')'  => ''}
puts Benchmark.measure {
  100000.times do
    line.gsub(/ \(|\)/, mapping)
  end
} # => 0.08

需要“基准测试”
line=“化学品（化学）”
#@hundredwatt
衡量{
10万倍
描述，标记=行。拆分（“”，2）
tag=tag[1，（tag.length-1）-1]
新线=说明0.08
#钢条
映射={'（'=>'，'，
')'  => ''}
衡量{
10万倍
line.gsub（/\（|\）/，映射）
结束
} # => 0.08
为什么最后要使用拆分方法？这实际上会创建一个数组，如[说明，关键字]
@NeX通过创建一个数组，我可以在sawa创建一个csvfile，你是对的。多年来向初学者解释的东西使我养成了说圆括号、花括号、尖括号和方括号的习惯。@kinjal所以你没有想办法`将这些行转换成逗号分隔的值，比如：。你在找一个创建数组的方法，如
[['Chemicals'，'chem'][electric，'elec']]`.@NeX:我不是ruby方面的专家，我正在创建一个数组，以便更容易写入CSV文件。请参见上面完成的示例。为什么在最后使用拆分方法？这实际上会创建一个数组，如[说明，关键字]
@NeX通过创建一个数组，我可以在sawa创建一个csvfile，你是对的。多年来向初学者解释的东西使我养成了说圆括号、花括号、尖括号和方括号的习惯。@kinjal所以你没有想办法`将这些行转换成逗号分隔的值，比如：。你在找一个创建数组的方法，如
[['Chemicals'，'chem'][Electrical，'elec']]`@NeX：我不是ruby方面的专家，我正在创建一个数组，以便更容易写入CSV文件。请参见上面完整的示例。@hundredwatt，速度很重要。说得太快了。有些单词带有空格，如“染料和中间产物”。我已将示例修改为拆分为”（'并从第一项和第二项中删除最后一个字符。这比使用常规