Python html表解析，删除每列的尾随字符？_Python_Replace_Beautifulsoup

Python html表解析，删除每列的尾随字符？

python replace

Python html表解析，删除每列的尾随字符？,python,replace,beautifulsoup,Python,Replace,Beautifulsoup,我使用各种在线资源创建了一个python脚本，该脚本将解析html表并将它们转换为csv import requests import sys from bs4 import BeautifulSoup url = requests.get(sys.argv[1]) html = BeautifulSoup(url.content,'html.parser') for br in html.find_all("br"): br.replace_with(&quo

我使用各种在线资源创建了一个python脚本，该脚本将解析html表并将它们转换为csv

import requests
import sys
from bs4 import BeautifulSoup

url = requests.get(sys.argv[1])

html = BeautifulSoup(url.content,'html.parser')

for br in html.find_all("br"):
    br.replace_with(",")

for tr in html.find_all('tr'):
    data = []   
    
    for td in tr.find_all('td'):
        data.append(td.text.strip())
        
    if data:
        print("{}".format('|'.join(data)))

它工作得很好，但它不能解释与我正在解析的数据之间的一些不一致。有些字段只包含一个br，但两侧都没有数据，有些字段以br结尾。这意味着某些行的列只包含“，”或以“，”结尾。我想我需要做的是在每列的基础上删除任何尾随的“br”或“，”

我试图避免的例子

，| dog，|意大利| House |天蓝色| Chris | 117 | 162 | 400140 | 110 | 160 | 701 | 2019-06-27

我想要

|狗|意大利|房子|天蓝色|克里斯| 117 | 162 | 400140 | 110 | 160 | 701 | 2019-06-27

我不太确定我的选择是什么。有人有什么建议吗？

您可以尝试以下方法：

for td in tr.find_all('td'):
    data.append(td.text.strip().rstrip(","))

str.rstrip可以接受一个参数，该参数是要删除的尾随字符

参考：