Python 使用Beautiful Soup从站点中抓取数据时无法获取完整的HTML
我试图刮取但我得到了HTML的一部分。我想得到代币的价格 应该放在里面Python 使用Beautiful Soup从站点中抓取数据时无法获取完整的HTML,python,selenium,web-scraping,beautifulsoup,Python,Selenium,Web Scraping,Beautifulsoup,我试图刮取但我得到了HTML的一部分。我想得到代币的价格 应该放在里面 <span class="text-success">$0.0000000218121</span> 我与你毫无关系 scraping.findAll("span") 试试这个: 从bs4导入美化组 从urllib.request导入请求,urlopen 从selenium导入webdriver 导入请求 导入时间#新导入 def刮取(url) browse
<span class="text-success">$0.0000000218121</span>
我与你毫无关系
scraping.findAll("span")
试试这个:
从bs4导入美化组
从urllib.request导入请求,urlopen
从selenium导入webdriver
导入请求
导入时间#新导入
def刮取(url)
browser=webdriver.PhantomJS()
browser.get(url)
时间。睡眠(5)#5秒
html=browser.page\u源
返回美化组(html,“lxml”)
#获取Html
页面=刮削(“https://poocoin.app/tokens/0x78bc22a215c1ef8a2e41fa1c39cd7bdc09bd5174")
#按str提取价格
price=page.find(“span”,class=“text success”).getText()
打印(价格)#输出3.74美元
PooCoin限制了phantomJS视图,最好使用它:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
from pyvirtualdisplay import Display
import telegram_send
from api.constants import *
coin_address = '0x4e8a9d0bf525d78fd9e0c88710099f227f6924cf'
def scraping(url):
display = Display(visible=0, size=(1200, 1200))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery")
print(chrome_driver_dir)
driver = webdriver.Chrome(executable_path=chrome_driver_dir, options=chrome_options)
driver.delete_all_cookies()
driver.get(url)
time.sleep(5) # 5 seconds
html = driver.page_source
display.stop()
return BeautifulSoup(html, 'lxml')
# Get Html
page = scraping("https://poocoin.app/tokens/" + coin_address)
# Extract price as str
prices = page.find_all("span", class_="text-success")
# the element position always changes
price = prices[7].getText()
print(price)
price=page.find(“span”,class=“text success”)
由于加载页面需要时间,而且您获取html的速度太快,因此它不会返回任何结果。查看我的更新答案(我刚刚成功运行)。它仍然有效吗?我试过了,它给了我AttributeError:'NoneType'对象没有属性'getText'我也延长了时间是的,我再次运行它没有问题。
scraping.findAll("span")
from bs4 import BeautifulSoup
from selenium import webdriver
import time
from pyvirtualdisplay import Display
import telegram_send
from api.constants import *
coin_address = '0x4e8a9d0bf525d78fd9e0c88710099f227f6924cf'
def scraping(url):
display = Display(visible=0, size=(1200, 1200))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery")
print(chrome_driver_dir)
driver = webdriver.Chrome(executable_path=chrome_driver_dir, options=chrome_options)
driver.delete_all_cookies()
driver.get(url)
time.sleep(5) # 5 seconds
html = driver.page_source
display.stop()
return BeautifulSoup(html, 'lxml')
# Get Html
page = scraping("https://poocoin.app/tokens/" + coin_address)
# Extract price as str
prices = page.find_all("span", class_="text-success")
# the element position always changes
price = prices[7].getText()
print(price)