解析HTML-搜索链接-是否可以搜索包含链接的段落?

解析HTML-搜索链接-是否可以搜索包含链接的段落?,html,ruby,parsing,Html,Ruby,Parsing,我正在分析维基百科上演员页面的链接,并试图找到他们出演的电影的链接 我有一个搜索链接并检查链接中单词film的基本方法。然而,许多电影链接实际上并不包含这个词 但是,在包含链接的段落中,会出现film一词,例如: <p>Dreyfuss's first film part was a small, uncredited role in <i><a href="/wiki/The_Graduate" title="The Graduate">The G

我正在分析维基百科上演员页面的链接,并试图找到他们出演的电影的链接

我有一个搜索链接并检查链接中单词
film
的基本方法。然而,许多电影链接实际上并不包含这个词

但是,在包含链接的段落中,会出现film一词,例如:

    <p>Dreyfuss's first film part was a small, uncredited role in 
<i><a href="/wiki/The_Graduate" title="The Graduate">The Graduate 

    // Paragraph goes on for a long time. 
德雷福斯的第一个电影角色是在电影中扮演一个未经认可的小角色

为什么不尝试解析维基百科文章的从影记录部分?在我看过的几个演员中,这似乎是相当标准的,它提到了这是否是一部电视剧,所以你可以很容易地过滤掉这些

<tr>
    <td>1966</td>
    <td><i><a href="/wiki/Gidget_(TV_series)" title="Gidget (TV series)">Gidget</a></i></td>
    <td>Durf the Drag</td>
    <td>TV series 1 episode</td>
</tr>
<tr>
    <td>1967</td>
    <td><i><a href="/wiki/Valley_of_the_Dolls_(film)" title="Valley of the Dolls (film)">Valley of the Dolls</a></i></td>
    <td>Assistant stage manager</td>
    <td>Uncredited</td>
</tr>

1966
不堪重负
电视连续剧第一集
1967
舞台助理经理
无信用
看起来您可以从代码中提取与此类似的节点,并保存所有信息,以便对其执行所需操作。第一个节点可以忽略,因为“TV”在不同的子节点中多次出现

希望这有帮助


-拉里

为什么不试着分析一下维基百科文章中的从影记录部分?在我看过的几个演员中,这似乎是相当标准的,它提到了这是否是一部电视剧,所以你可以很容易地过滤掉这些

<tr>
    <td>1966</td>
    <td><i><a href="/wiki/Gidget_(TV_series)" title="Gidget (TV series)">Gidget</a></i></td>
    <td>Durf the Drag</td>
    <td>TV series 1 episode</td>
</tr>
<tr>
    <td>1967</td>
    <td><i><a href="/wiki/Valley_of_the_Dolls_(film)" title="Valley of the Dolls (film)">Valley of the Dolls</a></i></td>
    <td>Assistant stage manager</td>
    <td>Uncredited</td>
</tr>

1966
不堪重负
电视连续剧第一集
1967
舞台助理经理
无信用
看起来您可以从代码中提取与此类似的节点,并保存所有信息,以便对其执行所需操作。第一个节点可以忽略,因为“TV”在不同的子节点中多次出现

希望这有帮助


-Larry

好的,我已经根据您的实际请求测试了代码,并得出以下结论

url = "http://en.wikipedia.org/wiki/Richard_Dreyfuss"
doc = Nokogiri::HTML(open(url))
all_links = doc.search("//a[@href]")
all_links.each do |link|
  p_text = link.ancestors("p").text
  link_index = p_text.index(link.text)
  unless link_index.nil?
     search_back = link_index > 50 ? link_index - 50 : 0
     p_text[search_back..link_index].downcase.include?("film") ? puts(link['href']) : nil
  end
end
输出

#=>/wiki/American_Graffiti
   /wiki/Jaws_(film)
   /wiki/Close_Encounters_of_the_Third_Kind
   /wiki/The_Graduate
   /wiki/The_Apprenticeship_of_Duddy_Kravitz_(film)
   /wiki/Down_And_Out_In_Beverly_Hills
   /wiki/Stakeout_(1987_film)
   /wiki/Stephen_King
   /wiki/The_Body_(novella)
   /wiki/Poseidon_(film)
   #cite_note-27
   /wiki/Jonathan_Tasini
这似乎满足了您提出的问题,但显然需要修改以满足您的需要

编辑
在段落中添加了您要求返回50个字符的请求。现在回复要短得多,但我不确定结果是否会像您希望的那样有用。这回答了问题,但没有准确捕捉到您希望的内容。例如,最后2个链接不是电影链接,但它们与世界电影的距离在50个字符以内。

好的,因此我根据您的实际请求测试了代码,并得出以下结论

url = "http://en.wikipedia.org/wiki/Richard_Dreyfuss"
doc = Nokogiri::HTML(open(url))
all_links = doc.search("//a[@href]")
all_links.each do |link|
  p_text = link.ancestors("p").text
  link_index = p_text.index(link.text)
  unless link_index.nil?
     search_back = link_index > 50 ? link_index - 50 : 0
     p_text[search_back..link_index].downcase.include?("film") ? puts(link['href']) : nil
  end
end
输出

#=>/wiki/American_Graffiti
   /wiki/Jaws_(film)
   /wiki/Close_Encounters_of_the_Third_Kind
   /wiki/The_Graduate
   /wiki/The_Apprenticeship_of_Duddy_Kravitz_(film)
   /wiki/Down_And_Out_In_Beverly_Hills
   /wiki/Stakeout_(1987_film)
   /wiki/Stephen_King
   /wiki/The_Body_(novella)
   /wiki/Poseidon_(film)
   #cite_note-27
   /wiki/Jonathan_Tasini
这似乎满足了您提出的问题,但显然需要修改以满足您的需要

编辑
在段落中添加了您要求返回50个字符的请求。现在回复要短得多,但我不确定结果是否会像您希望的那样有用。这回答了问题,但没有准确捕捉到您希望看到的内容。例如,最后两个链接不是电影链接,但它们位于世界电影的50个字符范围内。

可以在标签内搜索文本。有关示例,请参见

但是,我会这样做:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://en.wikipedia.org/wiki/Richard_Dreyfuss'))

table = doc.at('#Filmography').parent.next_element
films = table.search('tr')[1..-1].map{ |tr|
  tds = tr.search('td')
  year = tds.shift.text

  movie = tds.shift
  movie_url = movie.at('a')['href']
  movie_title = movie.at('a').text

  role = tds.shift.text

  {
    year: year,
    movie_url: movie_url,
    movie_title: movie_title,
    role: role
  }
}

films 
# => [{:year=>"1966",
#      :movie_url=>"/wiki/Bewitched",
#      :movie_title=>"Bewitched",
#      :role=>"Rodney"},
#     {:year=>"1966",
#      :movie_url=>"/wiki/Gidget_(TV_series)",
#      :movie_title=>"Gidget",
#      :role=>"Durf the Drag"},
#     {:year=>"1967",
#      :movie_url=>"/wiki/Valley_of_the_Dolls_(film)",
#      :movie_title=>"Valley of the Dolls",
#      :role=>"Assistant stage manager"},
#     {:year=>"1967",
#      :movie_url=>"/wiki/The_Graduate",
#      :movie_title=>"The Graduate",
#      :role=>"Boarding House Resident"},
#     {:year=>"1967",
#      :movie_url=>"/wiki/The_Big_Valley",
#      :movie_title=>"The Big Valley",
#      :role=>"Lud Akley"},
#     {:year=>"1968",
#      :movie_url=>"/wiki/The_Young_Runaways",
#      :movie_title=>"The Young Runaways",
#      :role=>"Terry"},
#     {:year=>"1969",
#      :movie_url=>"/wiki/Hello_Down_There",
#      :movie_title=>"Hello Down There",
#      :role=>"Harold Webster"},
#     {:year=>"1970",
#      :movie_url=>"/wiki/The_Mod_Squad",
#      :movie_title=>"The Mod Squad",
#      :role=>"Curtis Bell"},
#     {:year=>"1973",
#      :movie_url=>"/wiki/American_Graffiti",
#      :movie_title=>"American Graffiti",
#      :role=>"Curt Henderson"},
#     {:year=>"1973",
#      :movie_url=>"/wiki/Dillinger_(1973_film)",
#      :movie_title=>"Dillinger",
#      :role=>"Baby Face Nelson"},
#     {:year=>"1974",
#      :movie_url=>"/wiki/The_Apprenticeship_of_Duddy_Kravitz_(film)",
#      :movie_title=>"The Apprenticeship of Duddy Kravitz",
#      :role=>"Duddy"},
#     {:year=>"1974",
#      :movie_url=>"/wiki/The_Second_Coming_of_Suzanne",
#      :movie_title=>"The Second Coming of Suzanne",
#      :role=>"Clavius"},
#     {:year=>"1975",
#      :movie_url=>"/wiki/Inserts_(film)",
#      :movie_title=>"Inserts",
#      :role=>"The Boy Wonder"},
#     {:year=>"1975",
#      :movie_url=>"/wiki/Jaws_(film)",
#      :movie_title=>"Jaws",
#      :role=>"Matt Hooper"},
#     {:year=>"1976",
#      :movie_url=>"/wiki/Victory_at_Entebbe",
#      :movie_title=>"Victory at Entebbe",
#      :role=>"Colonel Yonatan 'Yonni' Netanyahu"},
#     {:year=>"1977",
#      :movie_url=>"/wiki/Close_Encounters_of_the_Third_Kind",
#      :movie_title=>"Close Encounters of the Third Kind",
#      :role=>"Roy Neary"},
#     {:year=>"1977",
#      :movie_url=>"/wiki/The_Goodbye_Girl",
#      :movie_title=>"The Goodbye Girl",
#      :role=>"Elliott Garfield"},
#     {:year=>"1978",
#      :movie_url=>"/wiki/The_Big_Fix",
#      :movie_title=>"The Big Fix",
#      :role=>"Moses Wine"},
#     {:year=>"1980",
#      :movie_url=>"/wiki/The_Competition_(film)",
#      :movie_title=>"The Competition",
#      :role=>"Paul Dietrich"},
#     {:year=>"1981",
#      :movie_url=>"/wiki/Whose_Life_Is_It_Anyway%3F_(1981_film)",
#      :movie_title=>"Whose Life Is It Anyway?",
#      :role=>"Ken Harrison"},
#     {:year=>"1984",
#      :movie_url=>"/wiki/The_Buddy_System_(film)",
#      :movie_title=>"The Buddy System",
#      :role=>"Joe"},
#     {:year=>"1986",
#      :movie_url=>"/wiki/Down_and_Out_in_Beverly_Hills",
#      :movie_title=>"Down and Out in Beverly Hills",
#      :role=>"David 'Dave' Whiteman"},
#     {:year=>"1986",
#      :movie_url=>"/wiki/Stand_by_Me_(film)",
#      :movie_title=>"Stand by Me",
#      :role=>"Narrator/Gordie LaChance (adult)"},
#     {:year=>"1987",
#      :movie_url=>"/wiki/Tin_Men",
#      :movie_title=>"Tin Men",
#      :role=>"Bill 'BB' Babowsky"},
#     {:year=>"1987",
#      :movie_url=>"/wiki/Stakeout_(1987_film)",
#      :movie_title=>"Stakeout",
#      :role=>"Det. Chris Lecce"},
#     {:year=>"1987",
#      :movie_url=>"/wiki/Nuts_(film)",
#      :movie_title=>"Nuts",
#      :role=>"Aaron Levinsky"},
#     {:year=>"1988",
#      :movie_url=>"/wiki/Moon_Over_Parador",
#      :movie_title=>"Moon Over Parador",
#      :role=>"Jack Noah/President Alphonse Simms"},
#     {:year=>"1989",
#      :movie_url=>"/wiki/Let_It_Ride_(film)",
#      :movie_title=>"Let It Ride",
#      :role=>"Jay Trotter"},
#     {:year=>"1989",
#      :movie_url=>"/wiki/Always_(1989_film)",
#      :movie_title=>"Always",
#      :role=>"Pete Sandich"},
#     {:year=>"1990",
#      :movie_url=>"/wiki/Rosencrantz_%26_Guildenstern_Are_Dead_(film)",
#      :movie_title=>"Rosencrantz & Guildenstern Are Dead",
#      :role=>"The Player"},
#     {:year=>"1990",
#      :movie_url=>"/wiki/Postcards_from_the_Edge_(film)",
#      :movie_title=>"Postcards from the Edge",
#      :role=>"Doctor Frankenthal"},
#     {:year=>"1991",
#      :movie_url=>"/wiki/Once_Around",
#      :movie_title=>"Once Around",
#      :role=>"Sam Sharpe"},
#     {:year=>"1991",
#      :movie_url=>"/wiki/Prisoner_of_Honor",
#      :movie_title=>"Prisoner of Honor",
#      :role=>"Col. Picquart"},
#     {:year=>"1991",
#      :movie_url=>"/wiki/What_About_Bob%3F",
#      :movie_title=>"What About Bob?",
#      :role=>"Dr. Leo Marvin"},
#     {:year=>"1993",
#      :movie_url=>"/wiki/Lost_in_Yonkers_(film)",
#      :movie_title=>"Lost in Yonkers",
#      :role=>"Louie Kurnitz"},
#     {:year=>"1993",
#      :movie_url=>"/wiki/Another_Stakeout",
#      :movie_title=>"Another Stakeout",
#      :role=>"Detective Chris Lecce"},
#     {:year=>"1994",
#      :movie_url=>"/wiki/Silent_Fall",
#      :movie_title=>"Silent Fall",
#      :role=>"Dr. Jake Rainer"},
#     {:year=>"1995",
#      :movie_url=>
#       "/w/index.php?title=The_Last_Word_(1995_film)&action=edit&redlink=1",
#      :movie_title=>"The Last Word",
#      :role=>"Larry"},
#     {:year=>"1995",
#      :movie_url=>"/wiki/The_American_President_(film)",
#      :movie_title=>"The American President",
#      :role=>"Senator Bob Rumson"},
#     {:year=>"1995",
#      :movie_url=>"/wiki/Mr._Holland%27s_Opus",
#      :movie_title=>"Mr. Holland's Opus",
#      :role=>"Glenn Holland"},
#     {:year=>"1996",
#      :movie_url=>"/wiki/James_and_the_Giant_Peach_(film)",
#      :movie_title=>"James and the Giant Peach",
#      :role=>"Centipede (voice)"},
#     {:year=>"1996",
#      :movie_url=>"/wiki/Mad_Dog_Time",
#      :movie_title=>"Mad Dog Time",
#      :role=>"Vic"},
#     {:year=>"1997",
#      :movie_url=>"/wiki/Night_Falls_on_Manhattan",
#      :movie_title=>"Night Falls on Manhattan",
#      :role=>"Sam Vigoda"},
#     {:year=>"1997",
#      :movie_url=>"/wiki/Oliver_Twist_(1997_film)",
#      :movie_title=>"Oliver Twist",
#      :role=>"Fagin"},
#     {:year=>"1998",
#      :movie_url=>"/wiki/Krippendorf%27s_Tribe",
#      :movie_title=>"Krippendorf's Tribe",
#      :role=>"Prof. James Krippendorf"},
#     {:year=>"1999",
#      :movie_url=>"/wiki/Lansky_(film)",
#      :movie_title=>"Lansky",
#      :role=>"Meyer Lansky"},
#     {:year=>"2000",
#      :movie_url=>"/wiki/The_Crew_(2000_film)",
#      :movie_title=>"The Crew",
#      :role=>"Bobby Bartellemeo/Narrator"},
#     {:year=>"2000",
#      :movie_url=>"/wiki/Fail_Safe_(2000_TV)",
#      :movie_title=>"Fail Safe",
#      :role=>"President of the United States"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/The_Old_Man_Who_Read_Love_Stories",
#      :movie_title=>"The Old Man Who Read Love Stories",
#      :role=>"Antonio Bolivar"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/Who_Is_Cletis_Tout%3F",
#      :movie_title=>"Who Is Cletis Tout?",
#      :role=>"Micah Donnelly"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/The_Education_of_Max_Bickford",
#      :movie_title=>"The Education of Max Bickford",
#      :role=>"Max Bickford"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/The_Day_Reagan_Was_Shot",
#      :movie_title=>"The Day Reagan Was Shot",
#      :role=>"Alexander Haig"},
#     {:year=>"2003",
#      :movie_url=>"/wiki/Coast_to_Coast_(TV_film)",
#      :movie_title=>"Coast to Coast",
#      :role=>"Barnaby Pierce"},
#     {:year=>"2004",
#      :movie_url=>"/wiki/Silver_City_(2004_film)",
#      :movie_title=>"Silver City",
#      :role=>"Chuck Raven"},
#     {:year=>"2006",
#      :movie_url=>"/wiki/Poseidon_(film)",
#      :movie_title=>"Poseidon",
#      :role=>"Richard Nelson"},
#     {:year=>"2007",
#      :movie_url=>"/wiki/Tin_Man_(TV_miniseries)",
#      :movie_title=>"Tin Man",
#      :role=>"Mystic Man"},
#     {:year=>"2007",
#      :movie_url=>"/wiki/Ocean_of_Fear",
#      :movie_title=>"Ocean of Fear",
#      :role=>"Narrator"},
#     {:year=>"2008",
#      :movie_url=>"/wiki/Signs_of_the_Time_(film)",
#      :movie_title=>"Signs of the Time",
#      :role=>"Narrator"},
#     {:year=>"2008",
#      :movie_url=>"/wiki/W._(film)",
#      :movie_title=>"W.",
#      :role=>"Dick Cheney"},
#     {:year=>"2008",
#      :movie_url=>"/w/index.php?title=America_Betrayed&action=edit&redlink=1",
#      :movie_title=>"America Betrayed",
#      :role=>"Narrator"},
#     {:year=>"2009",
#      :movie_url=>"/wiki/My_Life_in_Ruins",
#      :movie_title=>"My Life in Ruins",
#      :role=>"Irv"},
#     {:year=>"2009",
#      :movie_url=>"/wiki/Leaves_of_Grass_(film)",
#      :movie_title=>"Leaves of Grass",
#      :role=>"Pug Rothbaum"},
#     {:year=>"2009",
#      :movie_url=>"/wiki/The_Lightkeepers",
#      :movie_title=>"The Lightkeepers",
#      :role=>"Seth"},
#     {:year=>"2010",
#      :movie_url=>"/wiki/Piranha_3D",
#      :movie_title=>"Piranha 3D",
#      :role=>"Matthew Boyd"},
#     {:year=>"2010",
#      :movie_url=>"/wiki/Weeds_(TV_series)",
#      :movie_title=>"Weeds",
#      :role=>"Warren Schiff"},
#     {:year=>"2010",
#      :movie_url=>"/wiki/RED_(film)",
#      :movie_title=>"RED",
#      :role=>"Alexander Dunning"},
#     {:year=>"2012",
#      :movie_url=>"/wiki/Coma_(U.S._miniseries)",
#      :movie_title=>"Coma",
#      :role=>"Professor Hillside"},
#     {:year=>"2013",
#      :movie_url=>"/wiki/Very_Good_Girls",
#      :movie_title=>"Very Good Girls",
#      :role=>"Danny, Gerry's father"},
#     {:year=>"2013",
#      :movie_url=>"/wiki/Paranoia_(2013_film)",
#      :movie_title=>"Paranoia",
#      :role=>"Francis Cassidy"}]
要解释它在做什么:

“电影学”表是一个很好的信息来源;它是按逻辑组织的,因此编写代码来遍历它很容易

doc.at('#Filmography').parent.next_element
使用其上方的
标题查找该表,然后备份并查找下一个标记,即表本身

table.search('tr')[1..-1]
查找表中的
行,跳过第一行,然后(使用
map
)对其余行进行迭代


tds=tr.search('td')
查找表的单元格。从这一点上讲,就是通过查看我想要的元素,将节点集像数组一样剥离开来。代码的其余部分应该非常明显。一旦检索到感兴趣的各个部分,它们就被捆绑到一个散列中,该散列作为散列数组的一部分由
map

返回,可以在标记中搜索文本。有关示例,请参见

但是,我会这样做:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://en.wikipedia.org/wiki/Richard_Dreyfuss'))

table = doc.at('#Filmography').parent.next_element
films = table.search('tr')[1..-1].map{ |tr|
  tds = tr.search('td')
  year = tds.shift.text

  movie = tds.shift
  movie_url = movie.at('a')['href']
  movie_title = movie.at('a').text

  role = tds.shift.text

  {
    year: year,
    movie_url: movie_url,
    movie_title: movie_title,
    role: role
  }
}

films 
# => [{:year=>"1966",
#      :movie_url=>"/wiki/Bewitched",
#      :movie_title=>"Bewitched",
#      :role=>"Rodney"},
#     {:year=>"1966",
#      :movie_url=>"/wiki/Gidget_(TV_series)",
#      :movie_title=>"Gidget",
#      :role=>"Durf the Drag"},
#     {:year=>"1967",
#      :movie_url=>"/wiki/Valley_of_the_Dolls_(film)",
#      :movie_title=>"Valley of the Dolls",
#      :role=>"Assistant stage manager"},
#     {:year=>"1967",
#      :movie_url=>"/wiki/The_Graduate",
#      :movie_title=>"The Graduate",
#      :role=>"Boarding House Resident"},
#     {:year=>"1967",
#      :movie_url=>"/wiki/The_Big_Valley",
#      :movie_title=>"The Big Valley",
#      :role=>"Lud Akley"},
#     {:year=>"1968",
#      :movie_url=>"/wiki/The_Young_Runaways",
#      :movie_title=>"The Young Runaways",
#      :role=>"Terry"},
#     {:year=>"1969",
#      :movie_url=>"/wiki/Hello_Down_There",
#      :movie_title=>"Hello Down There",
#      :role=>"Harold Webster"},
#     {:year=>"1970",
#      :movie_url=>"/wiki/The_Mod_Squad",
#      :movie_title=>"The Mod Squad",
#      :role=>"Curtis Bell"},
#     {:year=>"1973",
#      :movie_url=>"/wiki/American_Graffiti",
#      :movie_title=>"American Graffiti",
#      :role=>"Curt Henderson"},
#     {:year=>"1973",
#      :movie_url=>"/wiki/Dillinger_(1973_film)",
#      :movie_title=>"Dillinger",
#      :role=>"Baby Face Nelson"},
#     {:year=>"1974",
#      :movie_url=>"/wiki/The_Apprenticeship_of_Duddy_Kravitz_(film)",
#      :movie_title=>"The Apprenticeship of Duddy Kravitz",
#      :role=>"Duddy"},
#     {:year=>"1974",
#      :movie_url=>"/wiki/The_Second_Coming_of_Suzanne",
#      :movie_title=>"The Second Coming of Suzanne",
#      :role=>"Clavius"},
#     {:year=>"1975",
#      :movie_url=>"/wiki/Inserts_(film)",
#      :movie_title=>"Inserts",
#      :role=>"The Boy Wonder"},
#     {:year=>"1975",
#      :movie_url=>"/wiki/Jaws_(film)",
#      :movie_title=>"Jaws",
#      :role=>"Matt Hooper"},
#     {:year=>"1976",
#      :movie_url=>"/wiki/Victory_at_Entebbe",
#      :movie_title=>"Victory at Entebbe",
#      :role=>"Colonel Yonatan 'Yonni' Netanyahu"},
#     {:year=>"1977",
#      :movie_url=>"/wiki/Close_Encounters_of_the_Third_Kind",
#      :movie_title=>"Close Encounters of the Third Kind",
#      :role=>"Roy Neary"},
#     {:year=>"1977",
#      :movie_url=>"/wiki/The_Goodbye_Girl",
#      :movie_title=>"The Goodbye Girl",
#      :role=>"Elliott Garfield"},
#     {:year=>"1978",
#      :movie_url=>"/wiki/The_Big_Fix",
#      :movie_title=>"The Big Fix",
#      :role=>"Moses Wine"},
#     {:year=>"1980",
#      :movie_url=>"/wiki/The_Competition_(film)",
#      :movie_title=>"The Competition",
#      :role=>"Paul Dietrich"},
#     {:year=>"1981",
#      :movie_url=>"/wiki/Whose_Life_Is_It_Anyway%3F_(1981_film)",
#      :movie_title=>"Whose Life Is It Anyway?",
#      :role=>"Ken Harrison"},
#     {:year=>"1984",
#      :movie_url=>"/wiki/The_Buddy_System_(film)",
#      :movie_title=>"The Buddy System",
#      :role=>"Joe"},
#     {:year=>"1986",
#      :movie_url=>"/wiki/Down_and_Out_in_Beverly_Hills",
#      :movie_title=>"Down and Out in Beverly Hills",
#      :role=>"David 'Dave' Whiteman"},
#     {:year=>"1986",
#      :movie_url=>"/wiki/Stand_by_Me_(film)",
#      :movie_title=>"Stand by Me",
#      :role=>"Narrator/Gordie LaChance (adult)"},
#     {:year=>"1987",
#      :movie_url=>"/wiki/Tin_Men",
#      :movie_title=>"Tin Men",
#      :role=>"Bill 'BB' Babowsky"},
#     {:year=>"1987",
#      :movie_url=>"/wiki/Stakeout_(1987_film)",
#      :movie_title=>"Stakeout",
#      :role=>"Det. Chris Lecce"},
#     {:year=>"1987",
#      :movie_url=>"/wiki/Nuts_(film)",
#      :movie_title=>"Nuts",
#      :role=>"Aaron Levinsky"},
#     {:year=>"1988",
#      :movie_url=>"/wiki/Moon_Over_Parador",
#      :movie_title=>"Moon Over Parador",
#      :role=>"Jack Noah/President Alphonse Simms"},
#     {:year=>"1989",
#      :movie_url=>"/wiki/Let_It_Ride_(film)",
#      :movie_title=>"Let It Ride",
#      :role=>"Jay Trotter"},
#     {:year=>"1989",
#      :movie_url=>"/wiki/Always_(1989_film)",
#      :movie_title=>"Always",
#      :role=>"Pete Sandich"},
#     {:year=>"1990",
#      :movie_url=>"/wiki/Rosencrantz_%26_Guildenstern_Are_Dead_(film)",
#      :movie_title=>"Rosencrantz & Guildenstern Are Dead",
#      :role=>"The Player"},
#     {:year=>"1990",
#      :movie_url=>"/wiki/Postcards_from_the_Edge_(film)",
#      :movie_title=>"Postcards from the Edge",
#      :role=>"Doctor Frankenthal"},
#     {:year=>"1991",
#      :movie_url=>"/wiki/Once_Around",
#      :movie_title=>"Once Around",
#      :role=>"Sam Sharpe"},
#     {:year=>"1991",
#      :movie_url=>"/wiki/Prisoner_of_Honor",
#      :movie_title=>"Prisoner of Honor",
#      :role=>"Col. Picquart"},
#     {:year=>"1991",
#      :movie_url=>"/wiki/What_About_Bob%3F",
#      :movie_title=>"What About Bob?",
#      :role=>"Dr. Leo Marvin"},
#     {:year=>"1993",
#      :movie_url=>"/wiki/Lost_in_Yonkers_(film)",
#      :movie_title=>"Lost in Yonkers",
#      :role=>"Louie Kurnitz"},
#     {:year=>"1993",
#      :movie_url=>"/wiki/Another_Stakeout",
#      :movie_title=>"Another Stakeout",
#      :role=>"Detective Chris Lecce"},
#     {:year=>"1994",
#      :movie_url=>"/wiki/Silent_Fall",
#      :movie_title=>"Silent Fall",
#      :role=>"Dr. Jake Rainer"},
#     {:year=>"1995",
#      :movie_url=>
#       "/w/index.php?title=The_Last_Word_(1995_film)&action=edit&redlink=1",
#      :movie_title=>"The Last Word",
#      :role=>"Larry"},
#     {:year=>"1995",
#      :movie_url=>"/wiki/The_American_President_(film)",
#      :movie_title=>"The American President",
#      :role=>"Senator Bob Rumson"},
#     {:year=>"1995",
#      :movie_url=>"/wiki/Mr._Holland%27s_Opus",
#      :movie_title=>"Mr. Holland's Opus",
#      :role=>"Glenn Holland"},
#     {:year=>"1996",
#      :movie_url=>"/wiki/James_and_the_Giant_Peach_(film)",
#      :movie_title=>"James and the Giant Peach",
#      :role=>"Centipede (voice)"},
#     {:year=>"1996",
#      :movie_url=>"/wiki/Mad_Dog_Time",
#      :movie_title=>"Mad Dog Time",
#      :role=>"Vic"},
#     {:year=>"1997",
#      :movie_url=>"/wiki/Night_Falls_on_Manhattan",
#      :movie_title=>"Night Falls on Manhattan",
#      :role=>"Sam Vigoda"},
#     {:year=>"1997",
#      :movie_url=>"/wiki/Oliver_Twist_(1997_film)",
#      :movie_title=>"Oliver Twist",
#      :role=>"Fagin"},
#     {:year=>"1998",
#      :movie_url=>"/wiki/Krippendorf%27s_Tribe",
#      :movie_title=>"Krippendorf's Tribe",
#      :role=>"Prof. James Krippendorf"},
#     {:year=>"1999",
#      :movie_url=>"/wiki/Lansky_(film)",
#      :movie_title=>"Lansky",
#      :role=>"Meyer Lansky"},
#     {:year=>"2000",
#      :movie_url=>"/wiki/The_Crew_(2000_film)",
#      :movie_title=>"The Crew",
#      :role=>"Bobby Bartellemeo/Narrator"},
#     {:year=>"2000",
#      :movie_url=>"/wiki/Fail_Safe_(2000_TV)",
#      :movie_title=>"Fail Safe",
#      :role=>"President of the United States"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/The_Old_Man_Who_Read_Love_Stories",
#      :movie_title=>"The Old Man Who Read Love Stories",
#      :role=>"Antonio Bolivar"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/Who_Is_Cletis_Tout%3F",
#      :movie_title=>"Who Is Cletis Tout?",
#      :role=>"Micah Donnelly"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/The_Education_of_Max_Bickford",
#      :movie_title=>"The Education of Max Bickford",
#      :role=>"Max Bickford"},
#     {:year=>"2001",
#      :movie_url=>"/wiki/The_Day_Reagan_Was_Shot",
#      :movie_title=>"The Day Reagan Was Shot",
#      :role=>"Alexander Haig"},
#     {:year=>"2003",
#      :movie_url=>"/wiki/Coast_to_Coast_(TV_film)",
#      :movie_title=>"Coast to Coast",
#      :role=>"Barnaby Pierce"},
#     {:year=>"2004",
#      :movie_url=>"/wiki/Silver_City_(2004_film)",
#      :movie_title=>"Silver City",
#      :role=>"Chuck Raven"},
#     {:year=>"2006",
#      :movie_url=>"/wiki/Poseidon_(film)",
#      :movie_title=>"Poseidon",
#      :role=>"Richard Nelson"},
#     {:year=>"2007",
#      :movie_url=>"/wiki/Tin_Man_(TV_miniseries)",
#      :movie_title=>"Tin Man",
#      :role=>"Mystic Man"},
#     {:year=>"2007",
#      :movie_url=>"/wiki/Ocean_of_Fear",
#      :movie_title=>"Ocean of Fear",
#      :role=>"Narrator"},
#     {:year=>"2008",
#      :movie_url=>"/wiki/Signs_of_the_Time_(film)",
#      :movie_title=>"Signs of the Time",
#      :role=>"Narrator"},
#     {:year=>"2008",
#      :movie_url=>"/wiki/W._(film)",
#      :movie_title=>"W.",
#      :role=>"Dick Cheney"},
#     {:year=>"2008",
#      :movie_url=>"/w/index.php?title=America_Betrayed&action=edit&redlink=1",
#      :movie_title=>"America Betrayed",
#      :role=>"Narrator"},
#     {:year=>"2009",
#      :movie_url=>"/wiki/My_Life_in_Ruins",
#      :movie_title=>"My Life in Ruins",
#      :role=>"Irv"},
#     {:year=>"2009",
#      :movie_url=>"/wiki/Leaves_of_Grass_(film)",
#      :movie_title=>"Leaves of Grass",
#      :role=>"Pug Rothbaum"},
#     {:year=>"2009",
#      :movie_url=>"/wiki/The_Lightkeepers",
#      :movie_title=>"The Lightkeepers",
#      :role=>"Seth"},
#     {:year=>"2010",
#      :movie_url=>"/wiki/Piranha_3D",
#      :movie_title=>"Piranha 3D",
#      :role=>"Matthew Boyd"},
#     {:year=>"2010",
#      :movie_url=>"/wiki/Weeds_(TV_series)",
#      :movie_title=>"Weeds",
#      :role=>"Warren Schiff"},
#     {:year=>"2010",
#      :movie_url=>"/wiki/RED_(film)",
#      :movie_title=>"RED",
#      :role=>"Alexander Dunning"},
#     {:year=>"2012",
#      :movie_url=>"/wiki/Coma_(U.S._miniseries)",
#      :movie_title=>"Coma",
#      :role=>"Professor Hillside"},
#     {:year=>"2013",
#      :movie_url=>"/wiki/Very_Good_Girls",
#      :movie_title=>"Very Good Girls",
#      :role=>"Danny, Gerry's father"},
#     {:year=>"2013",
#      :movie_url=>"/wiki/Paranoia_(2013_film)",
#      :movie_title=>"Paranoia",
#      :role=>"Francis Cassidy"}]
要解释它在做什么:

“电影学”表是一个很好的信息来源;它是按逻辑组织的,因此编写代码来遍历它很容易

doc.at('#Filmography').parent.next_element
使用其上方的
标题查找该表,然后备份并查找下一个标记,即表本身

table.search('tr')[1..-1]
查找表中的
行,跳过第一行,然后(使用
map
)对其余行进行迭代


tds=tr.search('td')
查找表的单元格。从这一点上讲,就是通过查看我想要的元素,将节点集像数组一样剥离开来。代码的其余部分应该非常明显。一旦检索到感兴趣的各个部分,它们就会被绑定到一个散列中,然后由
map

作为散列数组的一部分返回。给它一个漂亮的侧面范围,让它认为链接是一部电影(类似于100个字符内的单词film),然后刮掉电影页面,确认它是一部电影。另外,如果你想从演员->电影中获取,也许你应该尝试IMDB。
.parent
将返回父节点,因此在本例中,
标记,因此如果它们的格式都相同,你可以尝试
.parent.parent
,它应该返回
标记。如果您发布了您试图解析的页面,我可以对此进行测试。@engineersmnky我发布了链接“这里有一个想法”。给它一个漂亮的侧面范围,让它认为链接是一部电影(类似于100个字符内的单词film),然后刮掉电影页面,确认它是一部电影。另外,如果您所要做的只是从Actor->Films中获取,那么您可能应该尝试IMDB。
.parent
将返回父节点,因此在本例中,
标记可以使它们的格式与您可以尝试的方式相同