Python 基于Selenium的刮片代码失败,错误为NoTouchElementException

Python 基于Selenium的刮片代码失败,错误为NoTouchElementException,python,html,selenium,Python,Html,Selenium,我有一个Python代码,可以删除不同的数据。例如,它从以下内容中删除网站: 我做错了什么 更新: <div class="column-space w-col w-col-4"> <a data-ix="show-popup-on-click" target="_blank" rel="nofollow" href="https://example.com/" class="button full w-button" style

我有一个Python代码,可以删除不同的数据。例如,它从以下内容中删除
网站

我做错了什么

更新:

<div class="column-space w-col w-col-4">
   <a data-ix="show-popup-on-click" target="_blank" 
      rel="nofollow" href="https://example.com/" 
      class="button full w-button" 
      style="transition: all 0.4s ease 0s;">Website</a>

   <div class="space big"></div>
   <a target="_blank" rel="nofollow" 
      href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf" 
      class="button-2 w-button">Whitepaper</a>
   <div class="space big"></div>
   <a class="button-2 w-condition-invisible w-button">Program</a>
   <div class="space big w-condition-invisible"></div>
   <div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Token:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">UTC</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Price:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">1 LUC=0,05 USD</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Buy with:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USD, EUR</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Platform:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">MyPlatform</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix w-condition-invisible">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">No</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Yes</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Location:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Malta</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Can't join:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USA</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">January 25, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 5, 2018</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 12, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">March 5, 2018</div>
         </div>
      </div>
      <div>
         <div class="div-block-33">
            <div class="space big"></div>
            <div>
               <a target="_blank" rel="nofollow" 
               class="button green full w-condition-invisible w-button">JOIN WHITELIST NOW »</a>
               <div class="div-block-34">
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com" 
                     class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO Slack link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO Telegram link">
                  </a>
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="Talk link">
                  </a>
               </div>
            </div>
         </div>
      </div>
   </div>
</div>

当Selenium在HTML DOM中找不到对象时,会发生此错误

driver.get(link)
driver.implicitly_wait(10)
我的猜测是,您设置隐式等待的时间太晚了,Selenium试图在加载页面和HTML DOM中存在的元素之前获取该元素

driver.get(link)
driver.implicitly_wait(10)
文档在获取任何页面之前设置隐式等待:

driver = webdriver.PhantomJS()
driver.implicitly_wait(10)
driver.get(link)
这确保了selenium会等待页面完全加载后再查找锚标记元素

DocLink:

另外,如果您正在抓取的页面上没有通过javascript加载或创建的元素,那么您不需要selenium来进行简单的文本提取抓取。您可以使用核心库urllib.request获取页面,然后使用beautifulSoup进行刮取

更新:

<div class="column-space w-col w-col-4">
   <a data-ix="show-popup-on-click" target="_blank" 
      rel="nofollow" href="https://example.com/" 
      class="button full w-button" 
      style="transition: all 0.4s ease 0s;">Website</a>

   <div class="space big"></div>
   <a target="_blank" rel="nofollow" 
      href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf" 
      class="button-2 w-button">Whitepaper</a>
   <div class="space big"></div>
   <a class="button-2 w-condition-invisible w-button">Program</a>
   <div class="space big w-condition-invisible"></div>
   <div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Token:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">UTC</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Price:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">1 LUC=0,05 USD</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Buy with:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USD, EUR</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Platform:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">MyPlatform</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix w-condition-invisible">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">No</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Yes</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Location:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Malta</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Can't join:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USA</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">January 25, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 5, 2018</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 12, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">March 5, 2018</div>
         </div>
      </div>
      <div>
         <div class="div-block-33">
            <div class="space big"></div>
            <div>
               <a target="_blank" rel="nofollow" 
               class="button green full w-condition-invisible w-button">JOIN WHITELIST NOW »</a>
               <div class="div-block-34">
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com" 
                     class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO Slack link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO Telegram link">
                  </a>
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="Talk link">
                  </a>
               </div>
            </div>
         </div>
      </div>
   </div>
</div>
正如Ian在评论中所说,在这种情况下,隐式等待定位并不重要

问题在于定位策略

website = driver.find_element_by_link_text('Website').get_attribute('href')
在本例中,它找不到元素,该元素是指向带有大写字母的按钮的链接。 它似乎与HTMLDOM(“网站”)中的链接文本不匹配,而是与按钮上的css计算样式呈现文本网站匹配

在我看来,另一种定位策略,如css选择器或XPATH,似乎可以提供更可靠的结果:

driver.find_element_by_xpath("//a[contains(text(),'Website')]").get_attribute("href")
有关这些方面的更多信息,请参见:

代码中没有问题,在查看网页中的
网站
链接时,我可以将文本视为“网站”,但如果我使用相同的文本按链接文本查找元素,如下图所示,我将得到
无接触异常

website = driver.find_element_by_link_text("Website").get_attribute("href")
print(website)
我尝试过“等待”并使用了
partial\u link\u text
,但运气不佳

然后我尝试获取标记名“a”的所有元素,并使用下面的代码打印文本

elements = driver.find_elements_by_tag_name("a")
for element in elements:
    print(element.text)
后来我才知道它不是“网站”,而是“网站”。但我不知道它为什么会这样

将所有字符od网站更改为大写后,我能够识别元素并从中获取
href

driver.get("https://topicolist.com/ico/adhive")
website = driver.find_element_by_link_text("WEBSITE").get_attribute("href")
print(website)

希望它能解决您的问题。

调用
隐式等待()
并不重要,只要它发生在必须等待的
查找*
之前。就像Ian说的,我的猜测是错误的,因为隐式等待语句的位置在这种情况下并不重要。看起来我们需要更多的信息来帮助。我正在从这个链接中提取“网站”。你能检查一下吗?好的,谢谢你发布链接。我做了一些测试,让它工作了。看起来是因为你的定位策略<代码>驱动程序。通过链接文本(“网站”)查找元素似乎有效,而“网站”无效。看起来“按链接查找元素”文本不使用HTML DOM中的文本进行匹配,而是使用计算样式呈现的文本。您可以共享更多HTML吗?甚至可能是父元素的
文本
,您可以成功地
找到它?@Ian:是的,其他
找到它
效果很好。@Ian:我上传了整个HTML代码。使用Selenium可以成功找到的最近的祖先元素是什么?那个元素的
text
值是什么?我很好奇为什么它是网站的可能副本?你是怎么发现的?我刚刚捕获了所有带有标记名a的web元素并打印了文本