Python table.decompose（）：AttributeError:'；str'；对象没有属性'；分解'；_Python_Beautifulsoup_Attributeerror

Python table.decompose（）：AttributeError:'；str'；对象没有属性'；分解'；

python

Python table.decompose（）：AttributeError:'；str'；对象没有属性'；分解'；,python,beautifulsoup,attributeerror,Python,Beautifulsoup,Attributeerror,我正在尝试使用BeautifulSoup解析html文档。我试图编写一个代码，可以解析文档，查找所有表，并删除那些有数字/字母数字比率>15%。我使用给出的代码作为对上一个问题的回答：但由于某种原因，table.decompose（）参数被标记为错误。如果能得到任何帮助，我将不胜感激。请注意，我是一个初学者，因此，尽管我尝试过，但我并不总是理解更复杂的解决方案代码如下： test_file = 'locationoftestfile.html' # Define a function

我正在尝试使用BeautifulSoup解析html文档。我试图编写一个代码，可以解析文档，查找所有表，并删除那些有数字/字母数字比率>15%。我使用给出的代码作为对上一个问题的回答：

但由于某种原因，table.decompose（）参数被标记为错误。如果能得到任何帮助，我将不胜感激。请注意，我是一个初学者，因此，尽管我尝试过，但我并不总是理解更复杂的解决方案

代码如下：

test_file = 'locationoftestfile.html'


# Define a function to remove tables which have numeric characters/ alphabetic and numeric characters > 15%
def remove_table(table):
        table = re.sub('<[^>]*>', ' ', str(table))
        numeric = sum(c.isdigit() for c in table)
        print('numeric: ' + str(numeric))
        alphabetic = sum(c.isalpha() for c in table)
        print('alpha: ' + str(alphabetic))
        try:
                ratio = numeric / float(numeric + alphabetic)
                print('ratio: '+ str(ratio))
        except ZeroDivisionError as err:
                ratio = 1
        if ratio > 0.15: 
            table.decompose()


# Define a function to create our Soup object and then extract text
def file_to_text(file):
    soup_file = open(file, 'r')
    soup = BeautifulSoup(soup_file, 'html.parser')
    for table in soup.find_all('table'):
        remove_table(table)
    text = soup.get_text()
    return text


file_to_text(test_file)

但是，也许是天真的，我不明白这会如何删除表格

table=re.sub（']*>'，''，str（table））
table = re.sub('<[^>]*>', ' ', str(table))

这将用字符串覆盖参数“table”。您可能想在这里为变量使用另一个名称。例如

def remove_table(table):
    table_as_str = re.sub('<[^>]*>', ' ', str(table))
    numeric = sum(c.isdigit() for c in table_as_str)
    print('numeric: ' + str(numeric))
    alphabetic = sum(c.isalpha() for c in table_as_str)
    print('alpha: ' + str(alphabetic))
    try:
            ratio = numeric / float(numeric + alphabetic)
            print('ratio: '+ str(ratio))
    except ZeroDivisionError as err:
            ratio = 1
    if ratio > 0.15: 
        table.decompose()

def移除_表（表）：
表_as_str=re.sub（']*>'，''，str（表））
numeric=总和（c.isdigit（）表示表中的c作为\u str）
打印（'数字：'+str（数字））
字母c=总和（c.isalpha（）表示表中的c作为\u str）
打印（'alpha:'+str（字母））
尝试：
比率=数字/浮点（数字+字母）
打印（'比率：'+str（比率））
除零错误作为错误外：
比率=1
如果比率>0.15：
表.分解（）

table=re.sub（']*>，''，str（table））

这将用字符串覆盖参数“table”。您可能想在这里为变量使用另一个名称。例如

def remove_table(table):
    table_as_str = re.sub('<[^>]*>', ' ', str(table))
    numeric = sum(c.isdigit() for c in table_as_str)
    print('numeric: ' + str(numeric))
    alphabetic = sum(c.isalpha() for c in table_as_str)
    print('alpha: ' + str(alphabetic))
    try:
            ratio = numeric / float(numeric + alphabetic)
            print('ratio: '+ str(ratio))
    except ZeroDivisionError as err:
            ratio = 1
    if ratio > 0.15: 
        table.decompose()

def移除_表（表）：
表_as_str=re.sub（']*>'，''，str（表））
numeric=总和（c.isdigit（）表示表中的c作为\u str）
打印（'数字：'+str（数字））
字母c=总和（c.isalpha（）表示表中的c作为\u str）
打印（'alpha:'+str（字母））
尝试：
比率=数字/浮点（数字+字母）
打印（'比率：'+str（比率））
除零错误作为错误外：
比率=1
如果比率>0.15：
表.分解（）

这段代码在我看来相当“疯狂”（用正则表达式解析html）。你能分享HTML吗？或者你能编辑问题并将示例（小）输入和预期输出放在那里吗？我同意@AndrejKesely的观点，

表。decompose（）

错误可能是你遇到的最小问题。你们可能都是对的。这段代码是我的讲师（第二个def）提供的代码和我从链接中获取的代码的混合体。谢天谢地，下面的解决方案似乎奏效了，所以，现在，这对我来说就足够了！这段代码在我看来相当“疯狂”（用正则表达式解析html）。你能分享HTML吗？或者你能编辑问题并将示例（小）输入和预期输出放在那里吗？我同意@AndrejKesely的观点，

表。decompose（）

错误可能是你遇到的最小问题。你们可能都是对的。这段代码是我的讲师（第二个def）提供的代码和我从链接中获取的代码的混合体。谢天谢地，下面的解决方案似乎奏效了，所以，现在，这对我来说就足够了！

def remove_table(table):
    table_as_str = re.sub('<[^>]*>', ' ', str(table))
    numeric = sum(c.isdigit() for c in table_as_str)
    print('numeric: ' + str(numeric))
    alphabetic = sum(c.isalpha() for c in table_as_str)
    print('alpha: ' + str(alphabetic))
    try:
            ratio = numeric / float(numeric + alphabetic)
            print('ratio: '+ str(ratio))
    except ZeroDivisionError as err:
            ratio = 1
    if ratio > 0.15: 
        table.decompose()