Python:beautifulsoupfind属性_Python_Beautifulsoup

Python:beautifulsoupfind属性

python

Python:beautifulsoupfind属性,python,beautifulsoup,Python,Beautifulsoup,我想从一个网站上获取一些百分比。到目前为止，代码如下： import requests from bs4 import BeautifulSoup from urllib.request import Request, urlopen from urllib.error import URLError, HTTPError lista=[] site = 'https://es.investing.com/indices/indices-futures' harware = {'User-Ag

我想从一个网站上获取一些百分比。到目前为止，代码如下：

import requests
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
lista=[]

site = 'https://es.investing.com/indices/indices-futures'
harware = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0'}
request = Request(site,headers=harware)
page = urlopen(request)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

cotizacion = soup.find_all('td',{"class": "datatable_cell__3gwri datatable_cell--align-end__Wua8C datatable_cell--" + "down__2CL8n" +" datatable_cell--bold__3e0BR table-browser_col-chg-pct__9p1T3"})
for datos in cotizacion:
    indices = datos.get_text()
    lista.append(indices)
print(lista)

通过这个，我得到了一个列表中的一系列百分比。但我的问题是，class属性仅在百分比为负值时获取数据，因为类名是down（“down_uu2cl8n”），但当它为up时，类名是相同的，除了该部分（“up_u2984w”）。我想两者兼得，积极的和消极的。所以我试着用以下方法来寻找答案：

soup.find_all('td',{"class": "datatable_cell__3gwri datatable_cell--align-end__Wua8C datatable_cell--" + "down__2CL8n" or "up__2984w" +" datatable_cell--bold__3e0BR table-browser_col-chg-pct__9p1T3"})

但这不起作用。

获取字符串可变部分的格式如何？

下一步可以这样做（假设顺序无关紧要）：

编辑：如评论所示，订单很重要，您可以参考以下ans：

下一步可以这样做（假设顺序无关紧要）：

编辑：如评论所示，订单很重要，您可以参考以下ans：

所需输出位于属性

表格浏览器\u col-chg-pct\uuuu 9p1T3

下，要仅选择第一个表格，您可以使用CSS选择器

.mb-6 td.表格浏览器\u col-chg-pct\uu 9p1T3

import requests
from bs4 import BeautifulSoup


URL = "https://es.investing.com/indices/indices-futures"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
}

soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")

print([tag.text for tag in soup.select(".mb-6 td.table-browser_col-chg-pct__9p1T3")])

输出：

['+0,12%', '+0,73%', '+1,97%', '+0,95%', '+1,13%', '+0,03%', '-0,15%', '-0,73%', '-0,05%', '+0,22%', '-0,65%', '-0,16%', '-0,37%', '-0,21%', '+0,11%', '-0,41%', '-0,40%', '-0,15%', '-0,38%', '+0,69%', '-0,89%', '-1,13%', '+0,23%', '-0,89%', '-0,75%', '-1,51%', '-0,22%', '+0,43%', '-1,27%', '+0,92%']

所需的输出位于属性

table-browser\u col-chg-pct\uuu 9p1T3

下，要仅选择第一个表格，您可以使用CSS选择器

.mb-6 td.table-browser\u col-chg-pct\uu 9p1T3

import requests
from bs4 import BeautifulSoup


URL = "https://es.investing.com/indices/indices-futures"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
}

soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")

print([tag.text for tag in soup.select(".mb-6 td.table-browser_col-chg-pct__9p1T3")])

输出：

['+0,12%', '+0,73%', '+1,97%', '+0,95%', '+1,13%', '+0,03%', '-0,15%', '-0,73%', '-0,05%', '+0,22%', '-0,65%', '-0,16%', '-0,37%', '-0,21%', '+0,11%', '-0,41%', '-0,40%', '-0,15%', '-0,38%', '+0,69%', '-0,89%', '-1,13%', '+0,23%', '-0,89%', '-0,75%', '-1,51%', '-0,22%', '+0,43%', '-1,27%', '+0,92%']

我会避免使用动态类值，而是确定所需值属于哪一列；然后使用类型的第n个从表中切掉该列。要获取表，我将使用attribute=value选择器获取带有

data test=price table

的父元素，然后使用后代组合符移动到子

table

元素。其目的是随着时间的推移，尝试并开发出更强大的产品。当然，这特别引入了头字符串依赖关系

import requests
from bs4 import BeautifulSoup

URL = "https://es.investing.com/indices/indices-futures"
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"}
soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")
index = [i.text for i in soup.select('[data-test=price-table] table th')].index('% Var.') + 1
print([i.text for i in soup.select(f"[data-test=price-table] table td:nth-of-type({index})")])

您也可以使用pandas read_html：

import pandas as pd

table = pd.read_html('https://es.investing.com/indices/indices-futures')[0]
table['% Var.']

我会避免使用动态类值，而是确定所需值属于哪一列；然后使用类型的第n个从表中切掉该列。要获取表，我将使用attribute=value选择器获取带有

data test=price table

的父元素，然后使用后代组合符移动到子

table

元素。其目的是随着时间的推移，尝试并开发出更强大的产品。当然，这特别引入了头字符串依赖关系

import requests
from bs4 import BeautifulSoup

URL = "https://es.investing.com/indices/indices-futures"
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"}
soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")
index = [i.text for i in soup.select('[data-test=price-table] table th')].index('% Var.') + 1
print([i.text for i in soup.select(f"[data-test=price-table] table td:nth-of-type({index})")])

您也可以使用pandas read_html：

import pandas as pd

table = pd.read_html('https://es.investing.com/indices/indices-futures')[0]
table['% Var.']

你为什么不两个都检查一下呢？因为使用

或

运算符对您来说不是一个解决方案。我不会同时检查这两种运算符，因为顺序很重要。相同的类应用于不同的部分，所以我得到了不同的文本，但有些瓷器是正反插层，我需要尊重插层顺序才能知道哪一个是。有上层元素包含这两个类吗？我不这么认为。有一个包含所有表格的，但除此之外我看不到任何区别。你能分享html吗。没有这一点，我不能再多说了。你为什么不两个都检查一下呢？因为使用

或

运算符对您来说不是一个解决方案。我不会同时检查这两种运算符，因为顺序很重要。相同的类应用于不同的部分，所以我得到了不同的文本，但有些瓷器是正反插层，我需要尊重插层顺序才能知道哪一个是。有上层元素包含这两个类吗？我不这么认为。有一个包含所有表格的，但除此之外我看不到任何区别。你能分享html吗。如果没有这些，我不能说更多。是的，顺序很重要，否则我将不知道每个百分比是哪个索引名。这不是一个坏的解决方案，也不是完美的解决方案，因为我不仅得到了我想要的数据，还得到了更多与类具有相同开头的数据，但我可以使用它。非常感谢。是的，顺序很重要，否则我将不知道每个百分比的索引名。这不是一个坏的解决方案，也不是完美的，因为我不仅得到了我想要的数据，而且得到了更多与类开头相同的数据，但我可以使用它。非常感谢。太好了，正是我想要的。非常感谢。太好了，正是我想要的。非常感谢你。