清除html中的数据_Html_Python 3.x_Web Scraping_Beautifulsoup_Data Cleaning

清除html中的数据

html python-3.x web-scraping

清除html中的数据,html,python-3.x,web-scraping,beautifulsoup,data-cleaning,Html,Python 3.x,Web Scraping,Beautifulsoup,Data Cleaning,我正试图清理通过网络垃圾提取的部分数据。包含数据的HTML代码如下所示： <li class="price-was"> $1,699.00 <span class="price-was-data" style="display: none">1699.00</span> </li> 我之所以使用它，是因为数据如下： '\r\n $1,699.00\r\n 1699.00\n' 使用下面的代码行，

我正试图清理通过网络垃圾提取的部分数据。包含数据的HTML代码如下所示：

<li class="price-was">
    $1,699.00
    <span class="price-was-data" style="display: none">1699.00</span>
</li>

我之所以使用它，是因为数据如下：

'\r\n       $1,699.00\r\n            1699.00\n'

使用下面的代码行，我已经设法清理了一点，但我仍然有两倍的数字

PriceBefore = price_products_before[0].text.strip().replace("\r\n","")

我只需要一次1699，不带任何空格\r或\n。

从bs4导入BeautifulSoup
html=“”
$1,699.00
1699
“”“
soup=BeautifulSoup（html，'html.parser'）
尝试：
打印（soup.find（“li”，class=“price was”）.next\u element.strip（））
除：
打印（“未找到”）

输出：

$1,699.00

你是说像这样？PriceBefore=price\u products\u before[0]。text.split（）很抱歉，如果需要1699.00，请从span中获取它，而不是从LIprice\u products\u before=product.findAll（“span”，{“class”：“price was data”}）PriceBefore=price\u products\u[0]之前获取它。text不起作用。。。40#Preu-preor 41 Preu-products_-abans=producte.findAll（“span”，{“class”：“price-was-data”}）中的索引器回溯（最后一次调用）--->42个PreusAbans=Preu-products_-abans[0].text 43 Indexer错误：列表索引超出范围可能与显示无关？如何将其存储在变量中？我需要把它导出到一个CSV。你能帮我吗？你是对的！我是白痴！非常感谢你！明白了；）

$1,699.00