Ruby 如何刮取<;李>;和孩子们
我正在尝试刮取Ruby 如何刮取<;李>;和孩子们,ruby,web-scraping,nokogiri,Ruby,Web Scraping,Nokogiri,我正在尝试刮取标记及其内部的内容 HTML看起来像: <div class="insurancesAccepted"> <h4>What insurance does he accept?*</h4> <ul class="noBottomMargin"> <li class="first"><span>Aetna</span>
标记及其内部的内容
HTML看起来像:
<div class="insurancesAccepted">
<h4>What insurance does he accept?*</h4>
<ul class="noBottomMargin">
<li class="first"><span>Aetna</span></li>
<li>
<a title="See accepted plans" class="insurancePlanToggle arrowUp">AvMed</a>
<ul style="display: block;" class="insurancePlanList">
<li class="last first">Open Access</li>
</ul>
</li>
<li>
<a title="See accepted plans" class="insurancePlanToggle arrowUp">Blue Cross Blue Shield</a>
<ul style="display: block;" class="insurancePlanList">
<li class="last first">Blue Card PPO</li>
</ul>
</li>
<li>
<a title="See accepted plans" class="insurancePlanToggle arrowUp">Cigna</a>
<ul style="display: block;" class="insurancePlanList">
<li class="first">Cigna HMO</li>
<li>Cigna PPO</li>
<li class="last">Great West Healthcare-Cigna PPO</li>
</ul>
</li>
<li class="last">
<a title="See accepted plans" class="insurancePlanToggle arrowUp">Empire Blue Cross Blue Shield</a>
<ul style="display: block;" class="insurancePlanList">
<li class="last first">Empire Blue Cross Blue Shield HMO</li>
</ul>
</li>
</ul>
</div>
它一次显示所有
文本。我希望使用关系参数同时删除“AvMed”和“Open Access”,这样我就可以将其插入MySQL表中引用。问题在于doc.css('.insurancesAccepted li')
匹配所有嵌套列表项,而不仅仅是直接子体。要仅匹配直系后代,应使用parent>child
CSS规则。要完成任务,您需要仔细组合迭代的结果:
doc = Nokogiri::HTML(html)
result = doc.css('div.insurancesAccepted > ul > li').each do |li|
chapter = li.css('span').text.strip
section = li.css('a').text.strip
subsections = li.css('ul > li').map(&:text).map(&:strip)
puts "#{chapter} ⇒ [ #{section} ⇒ [ #{subsections.join(', ')} ] ]"
puts '=' * 40
end
导致:
# Aetna ⇒ [ ⇒ [ ] ]
# ========================================
# ⇒ [ AvMed ⇒ [ Open Access ] ]
# ========================================
# ⇒ [ Blue Cross Blue Shield ⇒ [ Blue Card PPO ] ]
# ========================================
# ⇒ [ Cigna ⇒ [ Cigna HMO, Cigna PPO, Great West Healthcare-Cigna PPO ] ]
# ========================================
# ⇒ [ Empire Blue Cross Blue Shield ⇒ [ Empire Blue Cross Blue Shield HMO ] ]
# ========================================
hi-its抛出以下错误“语法错误,意外$end,应为关键字_end p”#{chapter}”⇒ [#{section}⇒ [#{子节.连接(',')}]“”
# Aetna ⇒ [ ⇒ [ ] ]
# ========================================
# ⇒ [ AvMed ⇒ [ Open Access ] ]
# ========================================
# ⇒ [ Blue Cross Blue Shield ⇒ [ Blue Card PPO ] ]
# ========================================
# ⇒ [ Cigna ⇒ [ Cigna HMO, Cigna PPO, Great West Healthcare-Cigna PPO ] ]
# ========================================
# ⇒ [ Empire Blue Cross Blue Shield ⇒ [ Empire Blue Cross Blue Shield HMO ] ]
# ========================================