Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/90.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Div Python中URL的Web刮取(BeatifulSoup)_Python_Html_Css_Typeerror - Fatal编程技术网

Div Python中URL的Web刮取(BeatifulSoup)

Div Python中URL的Web刮取(BeatifulSoup),python,html,css,typeerror,Python,Html,Css,Typeerror,这是分区: <div class="theoplayer-poster" style="z-index: 1; display: inline-block; vertical-align: middle; background-repeat: no-repeat; background-position: 50% 50%; background-size: contain; cursor: pointer; margin: 0px; padding: 0px; position: abso

这是分区:

<div class="theoplayer-poster" style="z-index: 1; display: inline-block; vertical-align: middle; background-repeat: no-repeat; background-position: 50% 50%; background-size: contain; cursor: pointer; margin: 0px; padding: 0px; position: absolute; top: 0px; right: 0px; bottom: 0px; left: 0px; height: 100%; background-image: url(&quot;//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-exlarge-169.jpg&quot;);"></div>

但是,这会导致
类型错误:“NoneType”对象不可下标。任何帮助都将不胜感激

我认为您的主要问题是您指定的URL不包含该名称的div类。下面的代码处理URL的内容,希望它能够解释足够多的内容,以了解如何解析出您想要的内容

仅供参考,一个快速打印的soup将为您提供所有文本,将其发送到剪贴板,放入编辑器,可以突出显示文本并搜索您要查找的url。导航回以查看div类等

另外,在上面重新解析JS—urlopen不会为您解析JS—只有浏览器对象会这样做。如果您的字符串需要JS解析才能插入dom,我怀疑您运气不好

from urllib import urlopen
from bs4 import BeautifulSoup

# example div
# <div class="js-gigya-sharebar gigya-sharebar" data-description="April 25, 2018" data-image-src="//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-super-tease.jpg" data-isshorturl="true" data-link="https://cnn.it/2HVJmx0" data-subtitle="" data-title="CNN 10 - April 25, 2018" data-twitter-account="CNN"></div>


def cnn_get_thumb(cnn_url):
    page = urlopen(cnn_url)
    soup = BeautifulSoup(page, 'html.parser')
    img = soup.find('div', class_="js-gigya-sharebar")['data-image-src']
    return img


print cnn_get_thumb("http://cnn.com/2018/04/24/cnn10/ten-content-weds/index.html")
从urllib导入urlopen
从bs4导入BeautifulSoup
#示例div
# 
def cnn_get_thumb(cnn_url):
page=urlopen(cnn\U url)
soup=BeautifulSoup(页面“html.parser”)
img=soup.find('div',class=“js gigya sharbar”)['data-image-src']
返回img
打印cnn获取拇指(“http://cnn.com/2018/04/24/cnn10/ten-content-weds/index.html")

你能发布你想要抓取的url吗?就在那里。我只是把它作为一个学习的例子,以便以后扩展。如果你打印
soup
,你可以看到没有带有class
theoplayer poster
的元素,这很奇怪,它在html中。。。有什么想法吗?视频播放器上面有一个javascript部分,也许视频是使用javascript动态加载的。你确定JS在抓取之前已经被完全解析了吗?
from urllib import urlopen
from bs4 import BeautifulSoup

# example div
# <div class="js-gigya-sharebar gigya-sharebar" data-description="April 25, 2018" data-image-src="//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-super-tease.jpg" data-isshorturl="true" data-link="https://cnn.it/2HVJmx0" data-subtitle="" data-title="CNN 10 - April 25, 2018" data-twitter-account="CNN"></div>


def cnn_get_thumb(cnn_url):
    page = urlopen(cnn_url)
    soup = BeautifulSoup(page, 'html.parser')
    img = soup.find('div', class_="js-gigya-sharebar")['data-image-src']
    return img


print cnn_get_thumb("http://cnn.com/2018/04/24/cnn10/ten-content-weds/index.html")