Python Scrapy无法访问子div类
我正在使用Scrapy刮取此网页表格中的href链接。我能够访问divPython Scrapy无法访问子div类,python,html,xpath,web-scraping,scrapy,Python,Html,Xpath,Web Scraping,Scrapy,我正在使用Scrapy刮取此网页表格中的href链接。我能够访问divMVCGridTableHolder\u AdvancesEarchawardedProjects\u,但无法访问其子级,即div类行和div样式,我的尝试如下所示。是因为局部视图吗 html代码: <div id="MVCGridContainer_advancesearchawardedprojectsp_" data-key="" class="MVCGridContainer"> <!--Partia
MVCGridTableHolder\u AdvancesEarchawardedProjects\u
,但无法访问其子级,即div类行和div样式,我的尝试如下所示。是因为局部视图吗
html代码:
<div id="MVCGridContainer_advancesearchawardedprojectsp_" data-key="" class="MVCGridContainer">
<!--Partial View!-->
<div class="row"></div>
<div style="overflow-x:auto;">
<table name="MVCGridTable_advancesearchawardedprojectsp" class="table table-striped table-bordered iris-grid">
<thead></thead>
<tbody>
<tr>
<td>
<a class="grid-link" target="_top" href="https://researchgrant.gov.sg/pages/Awarded-Project-Detail.aspx?AXID=MOH-000080&CompanyCode=moh">INVESTIGATING DIVERSIFIED BIFUNCTIONAL MACROCYCLES BY PHAGE DISPLAY AS A NOVEL TECHNOLOGY PLATFORM</a>
</td>
</div></div>
刮壳尝试:
In [12]: quote = response.xpath('//div[@id="MVCGridTableHolder_advancesearchawardedprojectsp_"]')
In [13]: quote
Out[13]: [<Selector
xpath='//div[@id="MVCGridTableHolder_advancesearchawardedprojectsp_"]' data='<div id="MVCGridTableHolder_advancese...'>]
In [14]: quote = response.xpath('//div[@id="MVCGridTableHolder_advancesearchawardedprojectsp_"]/div[@class="row"]')
In [15]: quote
Out[15]: []
[12]中的:quote=response.xpath('//div[@id=“MVCGridTableHolder\u advancesearchawardedprojectsp\”)
在[13]中:引用
出[13]:[]
在[14]中,quote=response.xpath('//div[@id=“MVCGridTableHolder\u advanceSearchwardedProjectSP\]/div[@class=“row”]”)
在[15]中:引用
Out[15]:[]
如果在加载此页面时在浏览器中打开browser developer tools,您将看到会发送一个单独的XHR请求来加载该部分视图内容。您可以在代码中模拟该请求
使用请求的示例
:
import requests
with requests.Session() as session:
session.verify = False
session.headers = {
'X-Requested-With': 'XMLHttpRequest'
}
response = session.post("https://researchgrant.gov.sg/eservices/mvcgrid", params={
'keyword': '',
'source': 'sharepoint',
'type': 'project',
'status': 'open',
'page': '2',
'_pp_projectstatus': '',
'_pp_hiname': 'ab',
'_pp_piname': 'pua',
'_pp_source': 'sharepoint',
'_pp_details': ''},
data={
'name': 'advancesearchawardedprojectsp'
})
print(response.text)
在Scrapy中,您可以使用FormRequest
: