Python美化组属性错误_Python_Beautifulsoup

Python美化组属性错误

python

Python美化组属性错误,python,beautifulsoup,Python,Beautifulsoup,我正在尝试使用python beautifulsoup从html内容中获取一些图像url 我的HTML内容： <div id="photos" class="tab rel-photos multiple-photos"> <span id="watch-this" class="classified-detail-buttons"> <span id="c_id_

我正在尝试使用python beautifulsoup从html内容中获取一些图像url

我的HTML内容：

<div id="photos" class="tab rel-photos multiple-photos">
   <span id="watch-this" class="classified-detail-buttons">
   <span id="c_id_10832265:c_type_202:watch_this">
   <a href="/watchlist/classified/baby-items/10832265/1/" id="watch_this_logged" data-require-auth="favoriteAd" data-tr-event-name="dpv-add-to-favourites">
   <i class="fa fa-fw fa-star-o"></i></a></span>
   </span>
   <span id="thumb1" class=" image">
      <a href="https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6ImYzYWdrZm8xcDBlai1EVUJJWlpMRSIsInciOlt7ImZuIjoiNWpldWk3cWZ6aWU2MS1EVUJJWlpMRSIsInMiOjUwLCJwIjoiY2VudGVyLGNlbnRlciIsImEiOjgwfV19.s1GmifnZr0_Bx4HG8RTR4puYcxN0asqAmnBvSpIExEI/image;p=main"
         id="a-photo-modal-view:263986810"
         rel="photos-modal"
         target="_new"
         onClick="return dbzglobal_event_adapter(this);">
         <div style="background-image:url(https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6ImYzYWdrZm8xcDBlai1EVUJJWlpMRSIsInciOlt7ImZuIjoiNWpldWk3cWZ6aWU2MS1EVUJJWlpMRSIsInMiOjUwLCJwIjoiY2VudGVyLGNlbnRlciIsImEiOjgwfV19.s1GmifnZr0_Bx4HG8RTR4puYcxN0asqAmnBvSpIExEI/image;p=main);"></div>
      </a>
   </span>
   <ul id="thumbs-list">
      <li>
         <span id="thumb2" class="image2">
            <a href="https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6Imtmc3cxMWgzNTB2cTMtRFVCSVpaTEUiLCJ3IjpbeyJmbiI6IjVqZXVpN3FmemllNjEtRFVCSVpaTEUiLCJzIjo1MCwicCI6ImNlbnRlcixjZW50ZXIiLCJhIjo4MH1dfQ.Wo2YqPdWav8shtmyVO2AdisHmLX-ZLDAiskLPAmTSPU/image;p=main" id="a-photo-modal-view:263986811" rel="photos-modal" target="_new" onClick="return dbzglobal_event_adapter(this);" >
               <div style="background-image:url(https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6Imtmc3cxMWgzNTB2cTMtRFVCSVpaTEUiLCJ3IjpbeyJmbiI6IjVqZXVpN3FmemllNjEtRFVCSVpaTEUiLCJzIjo1MCwicCI6ImNlbnRlcixjZW50ZXIiLCJhIjo4MH1dfQ.Wo2YqPdWav8shtmyVO2AdisHmLX-ZLDAiskLPAmTSPU/image;p=thumb_retina);"></div>
            </a>
         </span>
      </li>
      <li id="thumbnails-info">
         4 Photos
      </li>
   </ul>
   <div id="photo-count">
      4 Photos - Click to enlarge
   </div>
</div>

但我得到了一个错误：

Traceback (most recent call last):
  File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/SCRAPE/boats.py", line 47, in <module>
    images = soup.find("div", {"id": ["photos"]}).find_all("a")
AttributeError: 'NoneType' object has no attribute 'find_all'

回溯（最近一次呼叫最后一次）：
文件“/Users/evisplab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/SCRAPE/boats.py”，第47行，in
images=soup.find（“div”，“id”：[“photos”]}）.find_all（“a”）
AttributeError:“非类型”对象没有“全部查找”属性

如何仅从href标签获取url？

您的代码更全面地适用于我（假设您的HTML为

HTML\u doc

）：

但是，您的问题是，URL中的

请求

返回的文本与您给出的HTML示例不一致。尽管您尝试提供随机用户代理，但服务器返回：

<li>You\'re a power user moving through this website with super-human speed.</li>\n                        <li>You\'ve disabled JavaScript in your web browser.</li>\n                        <li>A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this <a title=\'Third party browser plugins that block javascript\' href=\'http://ds.tl/help-third-party-plugins\' target=\'_blank\'>support article</a>.</li>\n                    </ul>\n                </div>\n                <p class="we-could-be-wrong" >\n                    We could be wrong, and sorry about that! Please complete the CAPTCHA below and we’ll get you back on dubizzle right away.

您是一个超级用户，以超人的速度浏览此网站。 \n您已禁用web浏览器中的JavaScript。 \n第三方浏览器插件（如Ghostery或NoScript）正在阻止JavaScript运行。更多信息可在此查看。 \n\n\n

\n我们可能错了，对此表示抱歉！请完成下面的验证码，我们会马上让您回到dubizzle。

由于验证码是为了防止刮擦，我建议尊重管理员的意愿，不要刮擦它。也许有API？

试试这个：

for item in soup.find_all('span'):
    try:
        link = item.find_all('a', href=True)[0].attrs.get('href', None)
    except IndexError:
        continue
    else:
        print(link)

输出

/watchlist/classified/baby-items/10832265/1/
/watchlist/classified/baby-items/10832265/1/
https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6ImYzYWdrZm8xcDBlai1EVUJJWlpMRSIsInciOlt7ImZuIjoiNWpldWk3cWZ6aWU2MS1EVUJJWlpMRSIsInMiOjUwLCJwIjoiY2VudGVyLGNlbnRlciIsImEiOjgwfV19.s1GmifnZr0_Bx4HG8RTR4puYcxN0asqAmnBvSpIExEI/image;p=main
https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6Imtmc3cxMWgzNTB2cTMtRFVCSVpaTEUiLCJ3IjpbeyJmbiI6IjVqZXVpN3FmemllNjEtRFVCSVpaTEUiLCJzIjo1MCwicCI6ImNlbnRlcixjZW50ZXIiLCJhIjo4MH1dfQ.Wo2YqPdWav8shtmyVO2AdisHmLX-ZLDAiskLPAmTSPU/image;p=main

page=requests.get（url，headers={'user-agent'：user_-agent.random}）soup=BeautifulSoup（page.text，'html.parser'）url=“”，这意味着，没有办法做到这一点？你对假冒ipThey的看法如何？他们将我的ip列入黑名单？同样的错误。回溯（最近一次调用）：文件“/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/SCRAP/boats.py”，第48行，在soup.find（“div”，“id”：[“photos”]}）。find_all（“a”）：AttributeError:“NoneType”对象没有属性“find_all”我更改了答案，试试看。否则请发送url，因为根据问题中的html，我无法复制您的errorurl=“”

for item in soup.find_all('span'):
    try:
        link = item.find_all('a', href=True)[0].attrs.get('href', None)
    except IndexError:
        continue
    else:
        print(link)

/watchlist/classified/baby-items/10832265/1/
/watchlist/classified/baby-items/10832265/1/
https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6ImYzYWdrZm8xcDBlai1EVUJJWlpMRSIsInciOlt7ImZuIjoiNWpldWk3cWZ6aWU2MS1EVUJJWlpMRSIsInMiOjUwLCJwIjoiY2VudGVyLGNlbnRlciIsImEiOjgwfV19.s1GmifnZr0_Bx4HG8RTR4puYcxN0asqAmnBvSpIExEI/image;p=main
https://images.dubizzle.com/v1/files/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmbiI6Imtmc3cxMWgzNTB2cTMtRFVCSVpaTEUiLCJ3IjpbeyJmbiI6IjVqZXVpN3FmemllNjEtRFVCSVpaTEUiLCJzIjo1MCwicCI6ImNlbnRlcixjZW50ZXIiLCJhIjo4MH1dfQ.Wo2YqPdWav8shtmyVO2AdisHmLX-ZLDAiskLPAmTSPU/image;p=main