Python 赋值前引用的局部变量“文本”_Python

Python 赋值前引用的局部变量“文本”

python

Python 赋值前引用的局部变量“文本”,python,Python,这是一个从文本文件中提取Unicode值的代码，但它给出了以下错误 # -*- coding: utf-8 -*- import codecs import os #from urllib import urlopen from bs4 import BeautifulSoup import re ##import nltk #def remove_content_li(input_document) : #soup =

这是一个从文本文件中提取Unicode值的代码，但它给出了以下错误

# -*- coding: utf-8 -*-
    import codecs
    import os
    #from urllib import urlopen
    from bs4 import BeautifulSoup
    import re
    ##import nltk
    #def remove_content_li(input_document) :

        #soup = BeautifulSoup(input_document)


    def  extract_unicode(input):
        _ascii_letters = re.compile(r'[a-zA-Z]', flags=re.UNICODE)
        symbols = re.compile(r'[{} &+( )" =!.?.:.. / |  » © : >< #  «  ,] 1 2 3 4 5 6 7 8 9 _ - + ; [ ]  %',flags=re.UNICODE)
        soup = BeautifulSoup(open(input,'r'),'lxml')
        for li in soup.find_all('li'):
            li.decompose()
            texts = soup.findAll(text=True)



        def contains_unicode(text):
            try:
                str(text)
            except:
                return True
            return False

        result = '  '.join((text for text in texts if contains_unicode(texts)))
        result =_ascii_letters.sub(" ", result)
        result = symbols.sub(" ",result)
        ##print(result)
    ##    result = nltk.clean_html(result)
        result.replace('*', '')

这就是我得到的错误

File "e3.py", line 50, in <module>
    extract_unicode((os.path.join(dirname, filename)))
  File "e3.py", line 30, in extract_unicode
    result = '  '.join((text for text in texts if contains_unicode(texts)))
UnboundLocalError: local variable 'texts' referenced before assignment

错误是准确地告诉您问题是什么。在定义变量之前，您正在使用变量文本。可能是soup.find_all'li'返回的是一个空列表，因为只有当它找到某个内容时，您才设置文本。

我不明白发生了什么事情，如错误所述，您在使用它之前没有设置文本。如果soup.find_all'li'不返回任何内容，则文本将是未定义的。您假设soup.find_all正在查找内容，而该假设似乎是错误的。对于extract_uniucodeinput:，声明text=[]，并查看是否得到不同的结果