Div Python中URL的Web刮取（BeatifulSoup）_Python_Html_Css_Typeerror

Div Python中URL的Web刮取（BeatifulSoup）

python html css

Div Python中URL的Web刮取（BeatifulSoup）,python,html,css,typeerror,Python,Html,Css,Typeerror,这是分区： <div class="theoplayer-poster" style="z-index: 1; display: inline-block; vertical-align: middle; background-repeat: no-repeat; background-position: 50% 50%; background-size: contain; cursor: pointer; margin: 0px; padding: 0px; position: abso

这是分区：

<div class="theoplayer-poster" style="z-index: 1; display: inline-block; vertical-align: middle; background-repeat: no-repeat; background-position: 50% 50%; background-size: contain; cursor: pointer; margin: 0px; padding: 0px; position: absolute; top: 0px; right: 0px; bottom: 0px; left: 0px; height: 100%; background-image: url(&quot;//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-exlarge-169.jpg&quot;);"></div>

但是，这会导致

类型错误：“NoneType”对象不可下标。任何帮助都将不胜感激
 我认为您的主要问题是您指定的URL不包含该名称的div类。下面的代码处理URL的内容，希望它能够解释足够多的内容，以了解如何解析出您想要的内容
仅供参考，一个快速打印的soup将为您提供所有文本，将其发送到剪贴板，放入编辑器，可以突出显示文本并搜索您要查找的url。导航回以查看div类等
另外，在上面重新解析JS—urlopen不会为您解析JS—只有浏览器对象会这样做。如果您的字符串需要JS解析才能插入dom，我怀疑您运气不好
from urllib import urlopen
from bs4 import BeautifulSoup

# example div
# <div class="js-gigya-sharebar gigya-sharebar" data-description="April 25, 2018" data-image-src="//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-super-tease.jpg" data-isshorturl="true" data-link="https://cnn.it/2HVJmx0" data-subtitle="" data-title="CNN 10 - April 25, 2018" data-twitter-account="CNN"></div>


def cnn_get_thumb(cnn_url):
    page = urlopen(cnn_url)
    soup = BeautifulSoup(page, 'html.parser')
    img = soup.find('div', class_="js-gigya-sharebar")['data-image-src']
    return img


print cnn_get_thumb("http://cnn.com/2018/04/24/cnn10/ten-content-weds/index.html")

从urllib导入urlopen
从bs4导入BeautifulSoup
#示例div
# 
def cnn_get_thumb（cnn_url）：
page=urlopen（cnn\U url）
soup=BeautifulSoup（页面“html.parser”）
img=soup.find（'div'，class=“js gigya sharbar”）['data-image-src']
返回img
打印cnn获取拇指（“http://cnn.com/2018/04/24/cnn10/ten-content-weds/index.html")
你能发布你想要抓取的url吗？就在那里。我只是把它作为一个学习的例子，以便以后扩展。如果你打印soup
，你可以看到没有带有classtheoplayer poster的元素，这很奇怪，它在html中。。。有什么想法吗？视频播放器上面有一个javascript部分，也许视频是使用javascript动态加载的。你确定JS在抓取之前已经被完全解析了吗？
from urllib import urlopen
from bs4 import BeautifulSoup

# example div
# <div class="js-gigya-sharebar gigya-sharebar" data-description="April 25, 2018" data-image-src="//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-super-tease.jpg" data-isshorturl="true" data-link="https://cnn.it/2HVJmx0" data-subtitle="" data-title="CNN 10 - April 25, 2018" data-twitter-account="CNN"></div>


def cnn_get_thumb(cnn_url):
    page = urlopen(cnn_url)
    soup = BeautifulSoup(page, 'html.parser')
    img = soup.find('div', class_="js-gigya-sharebar")['data-image-src']
    return img


print cnn_get_thumb("http://cnn.com/2018/04/24/cnn10/ten-content-weds/index.html")