使用BeautifulSoup的Python WebScrap未显示完整内容_Python_Html_Web Scraping_Beautifulsoup

使用BeautifulSoup的Python WebScrap未显示完整内容

python html web-scraping

使用BeautifulSoup的Python WebScrap未显示完整内容,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我正试图从一个网页中抓取所有文本，该网页嵌入了一个class=“calendar\uu cell calendar\uu currency”的“td”标记。到目前为止，我的代码只返回这个标记和类的第一次出现。我怎样才能让它在源代码中循环。这样它就可以一个接一个地返回所有事件。该网页是forexfactory.com from bs4 import BeautifulSoup import requests source = requests.get("https://www.forexfact

我正试图从一个网页中抓取所有文本，该网页嵌入了一个class=“calendar\uu cell calendar\uu currency”的“td”标记。到目前为止，我的代码只返回这个标记和类的第一次出现。我怎样才能让它在源代码中循环。这样它就可以一个接一个地返回所有事件。该网页是forexfactory.com

from bs4 import BeautifulSoup
import requests

source = requests.get("https://www.forexfactory.com/#detail=108867").text

soup = BeautifulSoup(source, 'lxml')

body = soup.find("body")

article = body.find("table", class_="calendar__table")

actual = article.find("td", class_="calendar__cell calendar__actual actual")

forecast = article.find("td", class_="calendar__cell calendar__forecast forecast").text

currency = article.find("td", class_="calendar__cell calendar__currency currency")

Tcurrency = currency.text
Tactual = actual.text

print(Tcurrency)

必须使用

find_all（）

获取所有元素，然后可以使用

for

-loop对其进行迭代

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.forexfactory.com/#detail=108867")

soup = BeautifulSoup(r.text, 'lxml')

table = soup.find("table", class_="calendar__table")

for row in table.find_all('tr', class_='calendar__row--grey'):

    currency = row.find("td", class_="currency")
    #print(currency.prettify()) # before get text
    currency = currency.get_text(strip=True)

    actual = row.find("td", class_="actual")
    actual = actual.get_text(strip=True)

    forecast = row.find("td", class_="forecast")
    forecast = forecast.get_text(strip=True)

    print(currency, actual, forecast)

结果

CHF 96.4 94.6
EUR 0.8% 0.9%
GBP 43.7K 41.3K
EUR 1.35|1.3 
USD -63.2B -69.2B
USD 0.0% 0.2%
USD 48.9 48.2
USD 1.2% 1.5%

顺便说一句：我发现这个页面使用JavaScript重定向页面，在浏览器中我看到了具有不同值的表。但若我在浏览器中关闭JavaScript，那个么它会显示我通过Python代码获得的数据<代码>美化组和请求不能运行JavaScript。如果您需要浏览器中的数据，那么您可能需要控制可以运行JavaScript的web浏览器。

使用

find_all（）

获取包含所有元素的列表，您可以使用

迭代-loop@furas是的，谢谢你，我认为它可以工作，但是你知道为什么吗。prettify（）不能处理这段代码。当我尝试使用美化时，我得到“结果对象没有对象美化”，您如何使用它<代码>美化组。美化（对象）
？在从对象获取文本之前是否使用美化（）
string
没有prettify（）
非常感谢您的帮助，您不必为我写出函数，但非常感谢。您能进一步解释一下Javascript是如何影响这种情况的吗？我不完全理解。谢谢当我关闭JavaScript并使用您的链接时，浏览器会显示12月30日<代码>的数据，但当JavaScript在浏览器中工作时，它会将我重定向到12月23日<代码>。它重定向到。问题使url中的#detail=108867
。