Css Nokogiri刮文本方法替代方案?

Css Nokogiri刮文本方法替代方案?,css,ruby-on-rails,ruby,nokogiri,Css,Ruby On Rails,Ruby,Nokogiri,我试图让Nokogiri在ESPN的网站上搜索林书豪的最后一场比赛统计数据,但是CSS文本方法给了我一个字符串,统计数据之间没有空格 scraper.get\u last\u game\u stats.text返回的字符串是: "Sat 11/16vsDENW 122-111326-11.5450-2.0004-6.66747113116Wed 11/13@ PHIL 117-1234910-19.5269-15.6005-6.833512005834Sat 11/9vsLACL 94-1072

我试图让Nokogiri在ESPN的网站上搜索林书豪的最后一场比赛统计数据,但是CSS文本方法给了我一个字符串,统计数据之间没有空格

scraper.get\u last\u game\u stats.text
返回的字符串是:

"Sat 11/16vsDENW 122-111326-11.5450-2.0004-6.66747113116Wed 11/13@ PHIL 117-1234910-19.5269-15.6005-6.833512005834Sat 11/9vsLACL 94-107263-7.4290-0.0000-0.0001701156"
我试图在每个统计数据之间放置空格,但是,即使我在主对象中循环,在迭代之间放置空格或破折号,我也无法拆分抢断、阻塞、点数、失误和其他所有数据:

class PlayerScraper
  attr_accessor :player_data, :name

  def initialize(url)
    @player_data = Nokogiri::HTML(open(url))
  end

  def get_last_game_stats
    @last_game_stats = @player_data.css('tr[class^="oddrow team-46"]')
  end
end

jlin_url = "http://espn.go.com/nba/player/_/id/4299/jeremy-lin"

scraper = PlayerScraper.new(jlin_url)
scraper.get_last_game_stats.text

有人能告诉我一个更好的方法吗

我认为您应该阅读
tr
元素,然后循环其HTML内容并分别处理每个
td
,否则,使用
text
方法和Rails HTML标记清理会使原始数据变得一团糟。

我认为您应该阅读
tr
元素,然后循环其HTML内容并分别处理每个
td
,否则,使用
text
方法和Rails HTML标记清理会使原始数据变得一团糟。

使用
text
方法会合并所有选定节点的文本。试试像这样的东西

scraper.get_last_game_stats.map(&:text)
如果希望单独计算
tr
节点。当我这样做时,你指向的url我得到:

["Sat 11/16", "vsDEN", "W 122-111", "32", "6-11", ".545", "0-2", ".000", "4-6", ".667", "4", "7", "1", "1", "3", "1", "16", "Wed 11/13", "@ PHI", "L 117-123", "49", "10-19", ".526", "9-15", ".600", "5-6", ".833", "5", "12", "0", "0", "5", "8", "34", "Sat 11/9", "vsLAC", "L 94-107", "26", "3-7", ".429", "0-0", ".000", "0-0", ".000", "1", "7", "0", "1", "1", "5", "6"]

我希望它看起来更像您正在寻找的。

方法包含所有选定节点的文本。试试像这样的东西

scraper.get_last_game_stats.map(&:text)
如果希望单独计算
tr
节点。当我这样做时,你指向的url我得到:

["Sat 11/16", "vsDEN", "W 122-111", "32", "6-11", ".545", "0-2", ".000", "4-6", ".667", "4", "7", "1", "1", "3", "1", "16", "Wed 11/13", "@ PHI", "L 117-123", "49", "10-19", ".526", "9-15", ".600", "5-6", ".833", "5", "12", "0", "0", "5", "8", "34", "Sat 11/9", "vsLAC", "L 94-107", "26", "3-7", ".429", "0-0", ".000", "0-0", ".000", "1", "7", "0", "1", "1", "5", "6"]

我希望它看起来更像您正在寻找的内容。

您正在浏览行,但不是包含的单元格。要以可用形式获取单元格的值,您需要同时执行这两项操作:

require 'open-uri'
require 'nokogiri'

URL = 'http://espn.go.com/nba/player/_/id/4299/jeremy-lin'
doc = Nokogiri::HTML(open(URL))

data = doc.css('tr[class^="oddrow team-46"]').map{ |tr|
  tr.css('td').map(&:text)
}

data
# => [["Sat 11/16",
#      "vsDEN",
#      "W 122-111",
#      "32",
#      "6-11",
#      ".545",
#      "0-2",
#      ".000",
#      "4-6",
#      ".667",
#      "4",
#      "7",
#      "1",
#      "1",
#      "3",
#      "1",
#      "16"],
#     ["Wed 11/13",
#      "@ PHI",
#      "L 117-123",
#      "49",
#      "10-19",
#      ".526",
#      "9-15",
#      ".600",
#      "5-6",
#      ".833",
#      "5",
#      "12",
#      "0",
#      "0",
#      "5",
#      "8",
#      "34"],
#     ["Sat 11/9",
#      "vsLAC",
#      "L 94-107",
#      "26",
#      "3-7",
#      ".429",
#      "0-0",
#      ".000",
#      "0-0",
#      ".000",
#      "1",
#      "7",
#      "0",
#      "1",
#      "1",
#      "5",
#      "6"]]
以不同的方式查看数据,将其输出为行:

data.each do |row|
  puts row.join(', ')
end
# >> Sat 11/16, vsDEN, W 122-111, 32, 6-11, .545, 0-2, .000, 4-6, .667, 4, 7, 1, 1, 3, 1, 16
# >> Wed 11/13, @ PHI, L 117-123, 49, 10-19, .526, 9-15, .600, 5-6, .833, 5, 12, 0, 0, 5, 8, 34
# >> Sat 11/9, vsLAC, L 94-107, 26, 3-7, .429, 0-0, .000, 0-0, .000, 1, 7, 0, 1, 1, 5, 6
表非常简单,可以使用两个嵌套循环创建。要稍后访问每个单元格,您需要执行相同的操作,在循环中遍历行,然后在该循环中遍历单元格。这就是我写的所有代码


另请参见“”。

您正在遍历行,但没有遍历包含的单元格。要以可用形式获取单元格的值,您需要同时执行这两项操作:

require 'open-uri'
require 'nokogiri'

URL = 'http://espn.go.com/nba/player/_/id/4299/jeremy-lin'
doc = Nokogiri::HTML(open(URL))

data = doc.css('tr[class^="oddrow team-46"]').map{ |tr|
  tr.css('td').map(&:text)
}

data
# => [["Sat 11/16",
#      "vsDEN",
#      "W 122-111",
#      "32",
#      "6-11",
#      ".545",
#      "0-2",
#      ".000",
#      "4-6",
#      ".667",
#      "4",
#      "7",
#      "1",
#      "1",
#      "3",
#      "1",
#      "16"],
#     ["Wed 11/13",
#      "@ PHI",
#      "L 117-123",
#      "49",
#      "10-19",
#      ".526",
#      "9-15",
#      ".600",
#      "5-6",
#      ".833",
#      "5",
#      "12",
#      "0",
#      "0",
#      "5",
#      "8",
#      "34"],
#     ["Sat 11/9",
#      "vsLAC",
#      "L 94-107",
#      "26",
#      "3-7",
#      ".429",
#      "0-0",
#      ".000",
#      "0-0",
#      ".000",
#      "1",
#      "7",
#      "0",
#      "1",
#      "1",
#      "5",
#      "6"]]
以不同的方式查看数据,将其输出为行:

data.each do |row|
  puts row.join(', ')
end
# >> Sat 11/16, vsDEN, W 122-111, 32, 6-11, .545, 0-2, .000, 4-6, .667, 4, 7, 1, 1, 3, 1, 16
# >> Wed 11/13, @ PHI, L 117-123, 49, 10-19, .526, 9-15, .600, 5-6, .833, 5, 12, 0, 0, 5, 8, 34
# >> Sat 11/9, vsLAC, L 94-107, 26, 3-7, .429, 0-0, .000, 0-0, .000, 1, 7, 0, 1, 1, 5, 6
表非常简单,可以使用两个嵌套循环创建。要稍后访问每个单元格,您需要执行相同的操作,在循环中遍历行,然后在该循环中遍历单元格。这就是我写的所有代码


另请参见“”。

非常感谢!惊人的答案。这正是我想要的,非常感谢你!惊人的答案。这正是我想要的