Python 面对"的问题,'';块内容块-块-内容1ACE621-3e0d-4b51-848d-aa830cd4a1c5“'';刮网时
我正在为此编写一个Python刮板: 我试图抓住以下CSS:Python 面对"的问题,'';块内容块-块-内容1ACE621-3e0d-4b51-848d-aa830cd4a1c5“'';刮网时,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在为此编写一个Python刮板: 我试图抓住以下CSS: <div _ngcontent-ail-c10="" class="inner"><!--bindings={ "ng-reflect-ng-if": "2019-09-05T14:30:00Z" }--><div _ngcontent-ail-c10="" class="se
<div _ngcontent-ail-c10="" class="inner"><!--bindings={
"ng-reflect-ng-if": "2019-09-05T14:30:00Z"
}--><div _ngcontent-ail-c10="" class="session-date ng-tns-c10-3 ng-star-inserted"><span _ngcontent-
ail-c10="" class="date-day">Thursday, September 5</span><span _ngcontent-ail-c10="" class="date-
time"><span _ngcontent-ail-c10="" class="date-time-start">2:30 PM</span><!--bindings={
"ng-reflect-ng-if": "2019-09-05T16:00:00Z"
}--><span _ngcontent-ail-c10="" class="date-time-end ng-tns-c10-3 ng-star-inserted"> - 4:00
PM</span></span></div><!--bindings={
"ng-reflect-ng-if": "Sapphire Ballroom C, Level 4"
}--><div _ngcontent-ail-c10="" class="session-location ng-tns-c10-3 ng-star-inserted"><strong
_ngcontent-ail-c10="" class="ng-tns-c10-3">Location:</strong> Sapphire Ballroom C, Level 4 </div><!-
-bindings={
"ng-reflect-ng-if": "General Session"
}--><div _ngcontent-ail-c10="" class="session-type ng-tns-c10-3 ng-star-inserted"> General Session
</div><!--bindings={
"ng-reflect-ng-if": "General Session"
}--><div _ngcontent-ail-c10="" class="session-title ng-tns-c10-3 ng-star-inserted"><!--bindings={
"ng-reflect-ng-if": "true",
"ng-reflect-ng-if-else": "[object Object]"
}--><a _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted" ng-reflect-router-
link="/session,12137" href="/session/12137">Considerations in Value-Based Contracting</a><!---->
</div><!--bindings={
"ng-reflect-ng-if": "true"
}--><div _ngcontent-ail-c10="" class="session-tracks ng-tns-c10-3 ng-star-inserted"><strong
_ngcontent-ail-c10="" class="ng-tns-c10-3">Track(s): </strong><!--bindings={
"ng-reflect-ng-for-of": "Education Track"
}--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">Education Track<!--bindings={
"ng-reflect-ng-if": "false"
}--></span></div><!--bindings={
"ng-reflect-ng-if": "true"
}--><div _ngcontent-ail-c10="" class="session-chair ng-tns-c10-3 ng-star-inserted"><strong
_ngcontent-ail-c10="" class="ng-tns-c10-3">Chair(s): </strong><!--bindings={
"ng-reflect-ng-for-of": "Linda D. Bosserman, MD, FACP, "
}--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">Linda D. Bosserman, MD, FACP,
FASCO | City of Hope<!--bindings={
"ng-reflect-ng-if": "true"
}--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">; </span></span><span
_ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">Barry Russo, MBA | The Center for Cancer
and Blood Disorders<!--bindings={
"ng-reflect-ng-if": "false"
}--></span></div><!--bindings={
"ng-reflect-ng-if": "1.5"
}--><div _ngcontent-ail-c10="" class="session-credit ng-tns-c10-3 ng-star-inserted"><strong
_ngcontent-ail-c10="" class="ng-tns-c10-3">Attendee CE/MOC Credit: </strong><span _ngcontent-ail-
c10="" class="ng-tns-c10-3">1.5</span></div><!--bindings={
"ng-reflect-ng-if": "true"
}--><a _ngcontent-ail-c10="" class="get-presentations ng-tns-c10-3 ng-star-inserted" href="#" ng-
reflect-klass="get-presentations" ng-reflect-ng-class="[object Object]">View Presentation<!--
bindings={
"ng-reflect-ng-if": "true"
}--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">s</span></a><!--bindings={}-->
</div>
但是它从网站上删除了“阻止内容”的内容。页面是通过
JavaScript
呈现的,我已经能够找到数据呈现的XHR
请求
现在你有字典了。你想干什么就干什么
import requests
r = requests.get(
"https://cdn-solr.asco.org/solr/ml/mlselect?_format=json&wt=json&indent=true&q=*&start=0&rows=30&sort=ISODateString%20asc,%20ISODateStringEnd%20asc,%20SessionType%20asc,%20SessionId%20asc&fq=-SessionType:%22Pre-Annual%20Meeting%20Seminar%22%20AND%20RecordType:sessions%20AND%20Meeting:%222019%20Oncology%20Practice%20Conference%22&fq={!tag=date}ISODateString:[*%20TO%20*]&fq=public_b:true&facet=true&f.Year.facet.sort=index&facet.field={!key=Year}Year&facet.field={!key=subject_thes}subject_thes&facet.field={!key=MediaTypes}MediaTypes&facet.field={!key=fctSessionType}fctSessionType&facet.pivot={!key=MeetingName}fctMeetingName,fctTrack&spellcheck.maxCollationTries=100").json()
print(r)
它确实在工作,但是如何清理结果数据,因为它几乎拾取了页面中的所有内容。
import requests
r = requests.get(
"https://cdn-solr.asco.org/solr/ml/mlselect?_format=json&wt=json&indent=true&q=*&start=0&rows=30&sort=ISODateString%20asc,%20ISODateStringEnd%20asc,%20SessionType%20asc,%20SessionId%20asc&fq=-SessionType:%22Pre-Annual%20Meeting%20Seminar%22%20AND%20RecordType:sessions%20AND%20Meeting:%222019%20Oncology%20Practice%20Conference%22&fq={!tag=date}ISODateString:[*%20TO%20*]&fq=public_b:true&facet=true&f.Year.facet.sort=index&facet.field={!key=Year}Year&facet.field={!key=subject_thes}subject_thes&facet.field={!key=MediaTypes}MediaTypes&facet.field={!key=fctSessionType}fctSessionType&facet.pivot={!key=MeetingName}fctMeetingName,fctTrack&spellcheck.maxCollationTries=100").json()
print(r)