在python中仅从td中选择价格值_Python_Html_Regex_Python 3.x_Beautifulsoup

在python中仅从td中选择价格值

python html regex python-3.x

在python中仅从td中选择价格值,python,html,regex,python-3.x,beautifulsoup,Python,Html,Regex,Python 3.x,Beautifulsoup,我试图从HTMLTD标签中捕获价格值，但问题是还有其他td具有相同的类名：请参见下图这是我写的代码 from builtins import any as b_any from urllib.parse import urlparse from urllib.parse import urljoin from collections import Counter import urllib.request import csv import schedule import time impo

我试图从HTMLTD标签中捕获价格值，但问题是还有其他td具有相同的类名：请参见下图

这是我写的代码

from  builtins import any as b_any
from urllib.parse import urlparse
from urllib.parse import urljoin
from collections import Counter
import urllib.request
import csv
import schedule
import time
import re
from bs4 import BeautifulSoup

url="http://offer.ebay.es/ws/eBayISAPI.dll?ViewBidsLogin&item=122713288532&rt=nc&_trksid=p2047675.l2564"

req = urllib.request.Request(url, headers={'User-agent': 'Mozilla/5.0'})

htmlpage = urllib.request.urlopen(req)

html = htmlpage.read().decode('utf-8')

soup = BeautifulSoup(html,"html.parser")

table = soup.find_all('td',{'class':'onheadNav'})

'''for txt in table:
    nametxt = txt.text
    result = ''.join([i for i in nametxt if not i.isdigit()])
    cleantxt = result.replace('(','')
    print(cleantxt.replace(')',''))

    rank = txt.a.text
    print(rank)'''
price = soup.select('td.contentValueFont')
for pr in price:
    print(pr.text)

若我在for循环中切片价格，它将只得到第一个价格，但我希望同时得到所有价格

编辑说明：

我想捕捉所有的价格，但问题是有三个td具有相同的类名，一个td表示价格，一个表示数量，一个表示日期，这些都具有相同的类名。当我尝试只获取price部分时，我的代码会返回所有三个td。我希望你现在就得到它

懒惰的方式：

soup = BeautifulSoup(html,"html.parser")

table = soup.find_all('table')

trs = table[9].select('tr') # You should select the table first (use your way)

for tr in trs: # loop the tr in the table
    if len(tr.select('td')) > 2: # check length
        print(tr.select('td')[2].text) # select third td

简短解决方案：

from bs4 import BeautifulSoup
import requests

url = "http://offer.ebay.es/ws/eBayISAPI.dll?ViewBidsLogin&item=122713288532&rt=nc&_trksid=p2047675.l2564"
html = requests.get(url).content
soup = BeautifulSoup(html, "html.parser")

prices =[ price.string.replace('\xa0', ' ')
          for price in soup.select('td.contentValueFont') if price.string.endswith('EUR')]
print(prices)

输出：

['4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '8,90 EUR', '8,90 EUR', '8,90 EUR', '8,90 EUR', '8,90 EUR', '8,90 EUR', '8,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR', '14,90 EUR', '4,90 EUR', '4,90 EUR', '4,90 EUR']

您需要做的是在表中找到所有您想要刮取的'tr'标记，然后迭代所有这些标记，以从特定的'td'获取文本

大概是这样的：

table = soup.find_all('table')
for tr in table[9].find_all('tr')[1:-1]:
    price = tr.find_all('td')[2].text.strip()
    print(price)

经过一些研究，我们可以发现我们想要的表是页面上的第10个表，因此

表[9]

。此外，由于我们不需要第一个和最后一个“tr”，我们需要

find_all（'tr'）[1:-1]

希望这能解决您的问题。

您描述中的矛盾：为了只获取价格价值，我希望同时获取所有价值。更新您的问题我想获取所有价格，但问题是有三个td具有相同的类别名称一个td用于价格一个用于Cantidad（数量）和一个用于日期这些都具有相同的类别。当我尝试只获取price部分时，我的代码会返回所有三个td。我希望你现在就拿到它。我不太了解beautifulsoup，但你可以尝试按职位（例如

td[1]

）而不是按类别获得td。我像你要求的那样将其切片，但它只返回第一个价格，打破了循环。它不会进入下一次迭代。