Python 使用beautifulsoup隐藏元素进行网页抓取_Python_Web Scraping_Beautifulsoup_Curl_Sed_Tr_Jq

Python 使用beautifulsoup隐藏元素进行网页抓取

python web-scraping curl sed

Python 使用beautifulsoup隐藏元素进行网页抓取,python,web-scraping,beautifulsoup,curl,sed,tr,jq,Python,Web Scraping,Beautifulsoup,Curl,Sed,Tr,Jq,我正在尝试使用beautifulsou刮取以下url：我已尝试解析我在inspect中找到的此部分： <div class="value"> <div class="marker position" style="left: 89.25%;"></div> <div class="text position" style="left: 89.25%;">1.43</div>

我正在尝试使用

beautifulsou

刮取以下url：

我已尝试解析我在inspect中找到的此部分：

     <div class="value">
          <div class="marker position" style="left: 89.25%;"></div>
          <div class="text position" style="left: 89.25%;">1.43</div>
     </div>

我得到的结果是一个空白列表：

[]

如何解决这个问题？

这个站点使用一个内部API来获取这些数据，这个API调用需要一些令牌，这些令牌嵌入在页面内的一些Javascript脚本中，因此您需要首先使用一些正则表达式废弃这些值，然后在API调用中使用它们

将bash脚本与、和一起使用：

我现在不能（在我的手机上）检查，但是你确定这个HTML确实在HTML源代码中，或者是从带有Javascript的API中提取的吗？如果是后者，那么你就不能通过抓取获得它，而必须直接使用API（这通常更容易，假设它们允许公共访问）。我只是在笔记本电脑上检查了一下——查看源代码后搜索“标记位置”不会产生任何结果，尽管像你一样，我可以通过检查查看HTML。正如我所说，它必须在客户端动态生成。至少有一个缩小的JS文件中有很多Ajax调用，不幸的是很多都是模糊的，所以我还没有找到你要追踪的特定数据的URL。它不是HTML源代码，并且认为它是从API中提取的，你能告诉我如何访问API吗？恐怕我不知道，这不是一个我曾经使用过的网站，我对交易或投资一无所知：）我认为没有比深入挖掘该网站的javascript代码并试图找出它来自哪个URL更好的方法了——但这不是一件容易的任务。是的，它工作得非常好。Selenium也可以工作，但是，您的代码似乎要好得多。@KeyvanTajbakhsh我认为在这种情况下，您应该坚持使用Selenium解决方案，因为上面的解决方案使用正则表达式，如果在此站点进行小规模升级，它可能会很快崩溃

[]

title=aapl

IFS=' ' read token token_userid < <(curl -s "https://www.investopedia.com/markets/stocks/$title/" | \
     tr -d '\n' | \
     sed -rn "s:.*Xignite\(\s*'([A-Z0-9]+)',\s*'([A-Z0-9]+)'.*:\1 \2:p")

curl -s "https://factsetestimates.xignite.com/xFactSetEstimates.json/GetLatestRecommendationSummaries?IdentifierType=Symbol&Identifiers=$title&UpdatedSince=&_token=$token&_token_userid=$token_userid" | \
     jq -r '.[].RecommendationSummarySet | .[].RecommendationScore'

import requests
import re

ticker = 'aapl'

r = requests.get('https://www.investopedia.com/markets/stocks/{}/'.format(ticker))

result = re.search(r".*Xignite\(\s*'([A-Z0-9]+)',\s*'([A-Z0-9]+)'", r.text)

token = result.group(1)
token_userid = result.group(2)

r = requests.get('https://factsetestimates.xignite.com/xFactSetEstimates.json/GetLatestRecommendationSummaries?IdentifierType=Symbol&Identifiers={}&UpdatedSince=&_token={}&_token_userid={}'
    .format(ticker, token, token_userid)
)

print(r.json()[0]['RecommendationSummarySet'][0]['RecommendationScore'])

import requests
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import bs4 as bs

caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal"
driver = webdriver.Chrome(desired_capabilities=caps)
driver.get('https://www.investopedia.com/markets/stocks/aapl/#Financials')
resp = driver.execute_script('return document.documentElement.outerHTML')
driver.quit()

soup = bs.BeautifulSoup(resp, 'html.parser')
res = soup.find('div', attrs={'class':'text position'}).text
print (res)