Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/fsharp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 当网站有文本时,Beauty Soup返回一个空字符串_Python_Web Scraping_Beautifulsoup_Python Requests - Fatal编程技术网

Python 当网站有文本时,Beauty Soup返回一个空字符串

Python 当网站有文本时,Beauty Soup返回一个空字符串,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,考虑到本网站: 我想删掉右边标题下的内容。下面是我的示例代码,它应该返回内容列表,但返回空字符串: import requests as req from bs4 import BeautifulSoup as bs r = req.get('https://dlnr.hawaii.gov/dsp/parks/oahu/ahupuaa-o-kahana-state-park/').text soup = bs(r) par = soup.find('h3', text= 'Facilitie

考虑到本网站:

我想删掉右边标题下的内容。下面是我的示例代码,它应该返回内容列表,但返回空字符串:

import requests as req
from bs4 import BeautifulSoup as bs

r = req.get('https://dlnr.hawaii.gov/dsp/parks/oahu/ahupuaa-o-kahana-state-park/').text
soup = bs(r)

par = soup.find('h3', text= 'Facilities')

for sib in par.next_siblings:
    print(sib)
这将返回:

<ul class="park_icon">
<div class="clearfix"></div>
</ul>

该网站没有显示该类的任何div元素。此外,列表项也没有被捕获。

设施和该框架中的其他信息由
JavaScript
动态加载,因此
bs4
在源
HTML
中看不到它们,因为它们根本不在那里

但是,您可以查询端点并获得所需的所有信息

以下是方法:

导入json
进口稀土
导入时间
导入请求
标题={
“用户代理”:“Mozilla/5.0(X11;Linux x86_64)”
“AppleWebKit/537.36(KHTML,像壁虎一样)”
“Chrome/90.0.4430.93 Safari/537.36”,
“推荐人”:https://dlnr.hawaii.gov/",
}
端点=f“https://stateparksadmin.ehawaii.gov/camping/park-site.json?parkId=57853&_={int(time.time())}”
response=requests.get(端点,headers=headers).text
data=json.loads(re.search(r“callback\(.*)\);”,response.group(1))
打印(“\n”.join(数据[“公园信息”][“设施”])中的f代表f)
输出:

Boat Ramp
Campsites
Picnic table
Restroom
Showers
Trash Cans
Water Fountain
以下是整个
JSON

{
  "park info": {
    "name": "Ahupua\u02bba \u02bbO Kahana State Park",
    "id": 57853,
    "island": "Oahu",
    "activities": [
      "Beachgoing",
      "Camping",
      "Dogs on Leash",
      "Fishing",
      "Hiking",
      "Hunting",
      "Sightseeing"
    ],
    "facilities": [
      "Boat Ramp",
      "Campsites",
      "Picnic table",
      "Restroom",
      "Showers",
      "Trash Cans",
      "Water Fountain"
    ],
    "prohibited": [
      "No Motorized Vehicles/ATV's",
      "No Alcoholic Beverages",
      "No Open Fires",
      "No Smoking",
      "No Commercial Activities"
    ],
    "hazards": [],
    "photos": [],
    "location": {
      "latitude": 21.556086,
      "longitude": -157.875579
    },
    "hiking": [
      {
        "name": "Nakoa Trail",
        "id": 17,
        "activities": [
          "Dogs on Leash",
          "Hiking",
          "Hunting",
          "Sightseeing"
        ],
        "facilities": [
          "No Drinking Water"
        ],
        "prohibited": [
          "No Bicycles",
          "No Open Fires",
          "No Littering/Dumping",
          "No Camping",
          "No Smoking"
        ],
        "hazards": [
          "Flash Flood"
        ],
        "photos": [],
        "location": {
          "latitude": 21.551087,
          "longitude": -157.881228
        },
        "has_google_street": false
      },
      {
        "name": "Kapa\u2018ele\u2018ele Trail",
        "id": 18,
        "activities": [
          "Dogs on Leash",
          "Hiking",
          "Sightseeing"
        ],
        "facilities": [
          "No Drinking Water",
          "Restroom",
          "Trash Cans"
        ],
        "prohibited": [
          "No Bicycles",
          "No Open Fires",
          "No Littering/Dumping",
          "No Camping",
          "No Smoking"
        ],
        "hazards": [],
        "photos": [],
        "location": {
          "latitude": 21.554744,
          "longitude": -157.876601
        },
        "has_google_street": false
      }
    ]
  }
}

你已经得到了必要的答案,我想我会提供另一种方式的见解,你可以预知发生了什么(除了查看网络流量)

让我们从你的观察开始:

未捕获列表项

通过检查每个li元素,我们可以看到html的形式是
class=“parkicon facilities icon01”
-其中01是一个变量,表示页面上可见的特定图标

快速搜索相关源文件将显示这些编号及其对应的设施参考列在中
https://dlnr.hawaii.gov/dsp/wp-content/themes/hic_state_template_StateParks/js/icon.js

var w_fac_icons={“ADA无障碍”:“01”,“船坡道”:“02”,“营地”:“03”,“食品特许”:“04”,“住宿”:“05”,“无饮用水”:“06”,“野餐亭”:“07”,“野餐桌”:“08”,“码头钓鱼”:“09”,“洗手间”:“10”,“淋浴”:“11”,“垃圾桶”:“12”,“步行道”:“13”,“饮水机”:“14”,“礼品店”:“15”,“风景点”:“16”}

如果随后在源html中搜索
w\u fac\u图标
,您将遇到(第560-582行):

如果您随后回溯函数
parkinfo
,您将到达第446行,在那里您将找到ajax请求,该请求动态获取用于更新网页的json数据:

function parkinfo() {
    var campID = 57853;

    jQuery.ajax( {
        type:'GET',
        url: 'https://stateparksadmin.ehawaii.gov/camping/park-site.json',
        data:"parkId=" + campID,
数据
可以在查询字符串中使用GET作为参数传递


因此,这就是您在“网络”选项卡中查找的请求。

感谢您的回复!我正在用Scrapy做这件事,并在多个页面中爬行。你知道如何使用Scrapy吗?不,对不起,我不使用Scrapy。你在上面的帖子中最初的尝试并没有表明任何关于Scrapy的信息。一旦你的问题得到回答,你就不应该提出任何新的要求。然而,你总是可以在@maverick创建一个新的帖子来描述任何新的问题。我就是这么想的。干得好+1.
var parkfac = parkinfo.facilities;
function parkinfo() {
    var campID = 57853;

    jQuery.ajax( {
        type:'GET',
        url: 'https://stateparksadmin.ehawaii.gov/camping/park-site.json',
        data:"parkId=" + campID,