Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/320.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 面对"的问题,'';块内容块-块-内容1ACE621-3e0d-4b51-848d-aa830cd4a1c5“'';刮网时_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 面对"的问题,'';块内容块-块-内容1ACE621-3e0d-4b51-848d-aa830cd4a1c5“'';刮网时

Python 面对"的问题,'';块内容块-块-内容1ACE621-3e0d-4b51-848d-aa830cd4a1c5“'';刮网时,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在为此编写一个Python刮板: 我试图抓住以下CSS: <div _ngcontent-ail-c10="" class="inner"><!--bindings={ "ng-reflect-ng-if": "2019-09-05T14:30:00Z" }--><div _ngcontent-ail-c10="" class="se

我正在为此编写一个Python刮板:

我试图抓住以下CSS:

<div _ngcontent-ail-c10="" class="inner"><!--bindings={
      "ng-reflect-ng-if": "2019-09-05T14:30:00Z"
    }--><div _ngcontent-ail-c10="" class="session-date ng-tns-c10-3 ng-star-inserted"><span _ngcontent- 
   ail-c10="" class="date-day">Thursday, September 5</span><span _ngcontent-ail-c10="" class="date- 
   time"><span _ngcontent-ail-c10="" class="date-time-start">2:30 PM</span><!--bindings={
      "ng-reflect-ng-if": "2019-09-05T16:00:00Z"
    }--><span _ngcontent-ail-c10="" class="date-time-end ng-tns-c10-3 ng-star-inserted"> - 4:00 
    PM</span></span></div><!--bindings={
      "ng-reflect-ng-if": "Sapphire Ballroom C, Level 4"
    }--><div _ngcontent-ail-c10="" class="session-location ng-tns-c10-3 ng-star-inserted"><strong 
    _ngcontent-ail-c10="" class="ng-tns-c10-3">Location:</strong> Sapphire Ballroom C, Level 4 </div><!- 
   -bindings={
      "ng-reflect-ng-if": "General Session"
    }--><div _ngcontent-ail-c10="" class="session-type ng-tns-c10-3 ng-star-inserted"> General Session 
    </div><!--bindings={
      "ng-reflect-ng-if": "General Session"
    }--><div _ngcontent-ail-c10="" class="session-title ng-tns-c10-3 ng-star-inserted"><!--bindings={
      "ng-reflect-ng-if": "true",
      "ng-reflect-ng-if-else": "[object Object]"
    }--><a _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted" ng-reflect-router- 
   link="/session,12137" href="/session/12137">Considerations in Value-Based Contracting</a><!----> 
   </div><!--bindings={
      "ng-reflect-ng-if": "true"
    }--><div _ngcontent-ail-c10="" class="session-tracks ng-tns-c10-3 ng-star-inserted"><strong 
    _ngcontent-ail-c10="" class="ng-tns-c10-3">Track(s): </strong><!--bindings={
      "ng-reflect-ng-for-of": "Education Track"
    }--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">Education Track<!--bindings={
      "ng-reflect-ng-if": "false"
    }--></span></div><!--bindings={
      "ng-reflect-ng-if": "true"
    }--><div _ngcontent-ail-c10="" class="session-chair ng-tns-c10-3 ng-star-inserted"><strong 
    _ngcontent-ail-c10="" class="ng-tns-c10-3">Chair(s): </strong><!--bindings={
      "ng-reflect-ng-for-of": "Linda D. Bosserman, MD, FACP, "
    }--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">Linda D. Bosserman, MD, FACP, 
    FASCO | City of Hope<!--bindings={
      "ng-reflect-ng-if": "true"
    }--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">; </span></span><span 
    _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">Barry Russo, MBA | The Center for Cancer 
    and Blood Disorders<!--bindings={
      "ng-reflect-ng-if": "false"
    }--></span></div><!--bindings={
      "ng-reflect-ng-if": "1.5"
    }--><div _ngcontent-ail-c10="" class="session-credit ng-tns-c10-3 ng-star-inserted"><strong 
    _ngcontent-ail-c10="" class="ng-tns-c10-3">Attendee CE/MOC Credit: </strong><span _ngcontent-ail- 
   c10="" class="ng-tns-c10-3">1.5</span></div><!--bindings={
      "ng-reflect-ng-if": "true"
     }--><a _ngcontent-ail-c10="" class="get-presentations ng-tns-c10-3 ng-star-inserted" href="#" ng- 
   reflect-klass="get-presentations" ng-reflect-ng-class="[object Object]">View Presentation<!-- 
    bindings={
      "ng-reflect-ng-if": "true"
    }--><span _ngcontent-ail-c10="" class="ng-tns-c10-3 ng-star-inserted">s</span></a><!--bindings={}--> 
    </div>

但是它从网站上删除了“阻止内容”的内容。

页面是通过
JavaScript
呈现的,我已经能够找到数据呈现的
XHR
请求

现在你有字典了。你想干什么就干什么

import requests

r = requests.get(
    "https://cdn-solr.asco.org/solr/ml/mlselect?_format=json&wt=json&indent=true&q=*&start=0&rows=30&sort=ISODateString%20asc,%20ISODateStringEnd%20asc,%20SessionType%20asc,%20SessionId%20asc&fq=-SessionType:%22Pre-Annual%20Meeting%20Seminar%22%20AND%20RecordType:sessions%20AND%20Meeting:%222019%20Oncology%20Practice%20Conference%22&fq={!tag=date}ISODateString:[*%20TO%20*]&fq=public_b:true&facet=true&f.Year.facet.sort=index&facet.field={!key=Year}Year&facet.field={!key=subject_thes}subject_thes&facet.field={!key=MediaTypes}MediaTypes&facet.field={!key=fctSessionType}fctSessionType&facet.pivot={!key=MeetingName}fctMeetingName,fctTrack&spellcheck.maxCollationTries=100").json()

print(r)

它确实在工作,但是如何清理结果数据,因为它几乎拾取了页面中的所有内容。
import requests

r = requests.get(
    "https://cdn-solr.asco.org/solr/ml/mlselect?_format=json&wt=json&indent=true&q=*&start=0&rows=30&sort=ISODateString%20asc,%20ISODateStringEnd%20asc,%20SessionType%20asc,%20SessionId%20asc&fq=-SessionType:%22Pre-Annual%20Meeting%20Seminar%22%20AND%20RecordType:sessions%20AND%20Meeting:%222019%20Oncology%20Practice%20Conference%22&fq={!tag=date}ISODateString:[*%20TO%20*]&fq=public_b:true&facet=true&f.Year.facet.sort=index&facet.field={!key=Year}Year&facet.field={!key=subject_thes}subject_thes&facet.field={!key=MediaTypes}MediaTypes&facet.field={!key=fctSessionType}fctSessionType&facet.pivot={!key=MeetingName}fctMeetingName,fctTrack&spellcheck.maxCollationTries=100").json()

print(r)