Ruby 如何刮取<；李>；和孩子们_Ruby_Web Scraping_Nokogiri

Ruby 如何刮取<；李>；和孩子们

ruby web-scraping

Ruby 如何刮取<；李>；和孩子们,ruby,web-scraping,nokogiri,Ruby,Web Scraping,Nokogiri,我正在尝试刮取标记及其内部的内容 HTML看起来像： <div class="insurancesAccepted"> <h4>What insurance does he accept?*</h4> <ul class="noBottomMargin"> <li class="first"><span>Aetna</span>

我正在尝试刮取

标记及其内部的内容

HTML看起来像：

 <div class="insurancesAccepted">
   <h4>What insurance does he accept?*</h4>
   <ul class="noBottomMargin">
      <li class="first"><span>Aetna</span></li>
      <li>
         <a title="See accepted plans" class="insurancePlanToggle arrowUp">AvMed</a>
         <ul style="display: block;" class="insurancePlanList">
            <li class="last first">Open Access</li>
         </ul>
      </li>
      <li>
         <a title="See accepted plans" class="insurancePlanToggle arrowUp">Blue Cross Blue Shield</a>
         <ul style="display: block;" class="insurancePlanList">
            <li class="last first">Blue Card PPO</li>
         </ul>
      </li>
      <li>
         <a title="See accepted plans" class="insurancePlanToggle arrowUp">Cigna</a>
         <ul style="display: block;" class="insurancePlanList">
            <li class="first">Cigna HMO</li>
            <li>Cigna PPO</li>
            <li class="last">Great West Healthcare-Cigna PPO</li>
         </ul>
      </li>
      <li class="last">
         <a title="See accepted plans" class="insurancePlanToggle arrowUp">Empire Blue Cross Blue Shield</a>
         <ul style="display: block;" class="insurancePlanList">
            <li class="last first">Empire Blue Cross Blue Shield HMO</li>
         </ul>
      </li>
   </ul>
  </div>

它一次显示所有

文本。我希望使用关系参数同时删除“AvMed”和“Open Access”，这样我就可以将其插入MySQL表中引用。
问题在于
doc.css（'.insurancesAccepted li'）
匹配所有嵌套列表项，而不仅仅是直接子体。要仅匹配直系后代，应使用
parent>child
CSS规则。要完成任务，您需要仔细组合迭代的结果：

doc = Nokogiri::HTML(html) result = doc.css('div.insurancesAccepted > ul > li').each do |li| chapter = li.css('span').text.strip section = li.css('a').text.strip subsections = li.css('ul > li').map(&:text).map(&:strip) puts "#{chapter} ⇒ [ #{section} ⇒ [ #{subsections.join(', ')} ] ]" puts '=' * 40 end
导致：

# Aetna ⇒ [ ⇒ [ ] ] # ======================================== # ⇒ [ AvMed ⇒ [ Open Access ] ] # ======================================== # ⇒ [ Blue Cross Blue Shield ⇒ [ Blue Card PPO ] ] # ======================================== # ⇒ [ Cigna ⇒ [ Cigna HMO, Cigna PPO, Great West Healthcare-Cigna PPO ] ] # ======================================== # ⇒ [ Empire Blue Cross Blue Shield ⇒ [ Empire Blue Cross Blue Shield HMO ] ] # ========================================

hi-its抛出以下错误“语法错误，意外$end，应为关键字_end p”#{chapter}”⇒ [#{section}⇒ [#{子节.连接（'，'）}]“”
# Aetna ⇒ [ ⇒ [ ] ] # ======================================== # ⇒ [ AvMed ⇒ [ Open Access ] ] # ======================================== # ⇒ [ Blue Cross Blue Shield ⇒ [ Blue Card PPO ] ] # ======================================== # ⇒ [ Cigna ⇒ [ Cigna HMO, Cigna PPO, Great West Healthcare-Cigna PPO ] ] # ======================================== # ⇒ [ Empire Blue Cross Blue Shield ⇒ [ Empire Blue Cross Blue Shield HMO ] ] # ========================================