Python爬虫可以';找不到元素

Python爬虫可以';找不到元素,python,beautifulsoup,web-crawler,Python,Beautifulsoup,Web Crawler,我正在使用Python练习爬虫程序 我的目标是找到考试日期 这就是我现在所做的 import urllib2 from bs4 import BeautifulSoup from urllib2 import urlopen, Request gre_url = 'https://ereg.ets.org/ereg/public/testcenter/availability/seats?testId=30&testName=GRE+General+Test&location=

我正在使用
Python
练习爬虫程序

我的目标是找到考试日期

这就是我现在所做的

import urllib2
from bs4 import BeautifulSoup
from urllib2 import urlopen, Request

gre_url = 'https://ereg.ets.org/ereg/public/testcenter/availability/seats?testId=30&testName=GRE+General+Test&location=Taipei+City%2C+Taiwan&latitude=25.0329636&longitude=121.56542680000007&testStartDate=April-01-2017&testEndDate=May-31-2017&currentTestCenterCount=0&sourceTestCenterCount=0&adminCode=&rescheduleFlow=false&isWorkflow=true&oldTestId=30&oldTestTime=&oldTestCenterId=&isUserLoggedIn=true&oldTestTitle=&oldTestCenter=&oldTestType=&oldTestDate=&oldTestTimeInfo=&peviewTestSummaryURL=%2Fresch%2Ftestpreview%2Fpreviewtestsummary&rescheduleURL='
data = urllib2.urlopen(gre_url).read()
soup = BeautifulSoup(data, "html.parser")
print soup.select('div.panel-heading.accordion-heading') # return []
但是,它似乎无法从
数据中提取元素
div.panel-heading.accordio heading
。 如何修复它?

在发出最终get请求以检查可用性之前,您需要分多个步骤访问后续URL。以下是一些对我有用的东西:


当然,请将最后一个URL更改为所需的URL。

当我试图转到您从中获取数据的URL时,它会重定向到主页。根据您正在抓取的URL,我会看到选项
isUserLoggedIn=true
。将此设置为false-
isUserLoggedIn=false将不会调用到主页的重定向。然后,你应该能够访问你想要的元素。@MD.KhairulBasar是的,你是对的,我也不能以匿名模式访问它。@Scratch'N'Purr它似乎不起作用,我仍然无法获得我想要的页面。太棒了!谢谢你帮了大忙!
import json

import requests
from bs4 import BeautifulSoup


start_url = "https://www.ets.org/gre/revised_general/register/centers_dates/"
workflow_url = "https://ereg.ets.org/ereg/public/workflowmanager/schlWorkflow?_p=GRI"
seats_url = "https://ereg.ets.org/ereg/public/testcenter/availability/seats"
with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}

    session.get(start_url)
    session.get(workflow_url)
    response = session.get("https://ereg.ets.org/ereg/public/testcenter/availability/seats?testId=30&testName=GRE+General+Test&location=New+York%2C+NY%2C+United+States&latitude=40.7127837&longitude=-74.00594130000002&testStartDate=March-27-2017&testEndDate=April-30-2017&currentTestCenterCount=0&sourceTestCenterCount=0&adminCode=&rescheduleFlow=false&isWorkflow=true&oldTestId=30&oldTestTime=&oldTestCenterId=&isUserLoggedIn=true&oldTestTitle=&oldTestCenter=&oldTestType=&oldTestDate=&oldTestTimeInfo=&peviewTestSummaryURL=%2Fresch%2Ftestpreview%2Fpreviewtestsummary&rescheduleURL=")#

    soup = BeautifulSoup(response.content, "html.parser")
    result = json.loads(soup.select_one('#findSeatResponse')['value'])
    for date in result['sortedDates']:
        print(date['displayDate'])