Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jsf-2/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Web scraping 用靓汤刮网页_Web Scraping_Beautifulsoup - Fatal编程技术网

Web scraping 用靓汤刮网页

Web scraping 用靓汤刮网页,web-scraping,beautifulsoup,Web Scraping,Beautifulsoup,我想从网页中“自动”提取一些信息(如“日期”、“法庭”、“街道”…)。 我想用靓汤来提取这些信息 但是,我在使用以下代码时遇到一些问题: from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url =' https://www.licitor.com/annonce/08/45/23/vente-aux-encheres/un-pavillon-a-usage-d-habita

我想从网页中“自动”提取一些信息(如“日期”、“法庭”、“街道”…)。 我想用靓汤来提取这些信息

但是,我在使用以下代码时遇到一些问题:

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup

my_url =' https://www.licitor.com/annonce/08/45/23/vente-aux-encheres/un-pavillon-a-usage-d-habitation/epinay-sur-seine/seine-saint-denis/084523.html'
uClient = uReq(my_url) 
page_html = uClient.read()
uClient.close()
page_soupe = soup(page_html,"html.parser")
page_soupe.findAll("article", {"class":"LegalAd"})
结果是

[<article class="LegalAd"></article>]
[
有没有办法解决这个问题

给你,伙计


嘿@moby91对我来说,你的代码运行良好,它显示了
文章中的数据
标记或尝试使用'requests`modulethak you@BhavyaParikh,但该命令不会返回标记开始和标记结束之间的所有内容(如图片中所示)。你能提供更多关于“requests”的详细信息吗模块?请从这里开始尝试,或者你可以从中找到主题,这样它自己就可以了兄弟,等等,让我这么做,兄弟,你得到了吗,让我这么做,兄弟,伙计,检查一下,在第12行将a.text替换为req.text
import xrzz
import re

url = 'https://www.licitor.com/annonce/08/45/23/vente-aux-encheres/un-pavillon-a-usage-d-habitation/epinay-sur-seine/seine-saint-denis/084523.html'
req = xrzz.http("GET", url=url,
    headers={
        "Host": "www.licitor.com",
        "Connection": "Close",
        "User-Agent": "Mozilla/5.0 (Linux; Android 10; SM-J400F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.66 Mobile Safari/537.36"
    }, tls=True).body()

print(re.findall("<h3>(.*?)</h3>", a.decode())[1])
import requests
import re
import bs4

url = 'https://www.licitor.com/annonce/08/45/23/vente-aux-encheres/un-pavillon-a-usage-d-habitation/epinay-sur-seine/seine-saint-denis/084523.html'
req= requests.get(url,
    headers={
        "Host": "www.licitor.com",
        "User-Agent": "Mozilla/5.0 (Linux; Android 10; SM-J400F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.66 Mobile Safari/537.36"
    })

pg = bs4.BeautifulSoup(a.text, 'lxml')
page_soup = pg.findAll("article", {"class":"LegalAd"})
for i in page_soup:
    print(i.find("h3").text)