Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/338.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
刮取信息-美化组/Python_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

刮取信息-美化组/Python

刮取信息-美化组/Python,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我的代码进入一个网站,提取URL,然后进入它刮取的URL(在这里工作正常) 现在在这个新的页面上,我想得到一些信息(作者姓名),但它是打印空白 代码如下: from selenium import webdriver from bs4 import BeautifulSoup import time import requests driver = webdriver.Chrome() eachLink=[] baseurl='https://meetinglibrary.asco.org' f

我的代码进入一个网站,提取URL,然后进入它刮取的URL(在这里工作正常)

现在在这个新的页面上,我想得到一些信息(作者姓名),但它是打印空白

代码如下:

from selenium import webdriver
from bs4 import BeautifulSoup
import time
import requests
driver = webdriver.Chrome()
eachLink=[]
baseurl='https://meetinglibrary.asco.org'
for x in range (1,2):
  driver.get(f'https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}')
  time.sleep(3)
  page_source = driver.page_source
  soup = BeautifulSoup(page_source,'html.parser')
  productlist=soup.find_all('a',class_='ng-star-inserted')
  for item in productlist:
     for link in item.find_all('a',href=True):
         eachLink.append(baseurl+link['href'])
print(eachLink)
infobox=[]
for b in eachLink:
    r=requests.get(b)
    time.sleep(1)
    soup1=BeautifulSoup(r.content,'html.parser')
    auth=soup1.find('a',class_='asset-metadata-value link ng-star-inserted')
    print(auth)
可能是时间。睡眠(1)在eachLink循环中不够长,页面仍在加载。您可以使用显式等待来检查预期条件,而不是使用time.sleep(隐式等待)

从selenium.webdriver.support.ui导入WebDriverWait
从selenium.webdriver.support将预期的_条件导入为EC
从selenium.webdriver.common.by导入
path=“//div[@id='YOURIDHERE']”更改为每个hlink都应该出现的内容
按钮=WebDriverWait(驱动程序,10)。直到(
EC.元素的存在位置(
(By.XPATH,path)
)

我认为这很有帮助。无需等待循环来提取作者

from selenium import webdriver
from bs4 import BeautifulSoup
import time
import requests
driver = webdriver.Chrome()
eachLink=[]
authors = []
baseurl='https://meetinglibrary.asco.org'
for x in range (1,2):
  driver.get(f'https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page={x}')
  time.sleep(120)
  page_source = driver.page_source

  soup = BeautifulSoup(page_source,'html.parser')
  productlist=soup.find_all('a',class_='ng-star-inserted')
  for auth in soup.find_all('div', {'class':'record__ellipsis'}):
      authors.append(auth.text)

  for item in productlist:
     for link in item.find_all('a',href=True):
         eachLink.append(baseurl+link['href'])
print(eachLink)
print('\n', authors, '\n')

driver.quit() 

那么到底是什么问题呢?这在文章中已经明确指出,它是在打印空白……谢谢,但我想最终获取更多信息。所以我想从href页面而不是当前页面获取信息