使用python和BeautifulSoup从html中提取链接：'；非类型'；对象没有属性'；属性'；_Python_Python 3.x_Beautifulsoup

使用python和BeautifulSoup从html中提取链接：'；非类型'；对象没有属性'；属性'；

python python-3.x

使用python和BeautifulSoup从html中提取链接：'；非类型'；对象没有属性'；属性'；,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,大家好，我正在使用python 3 beautifulsoup尝试提取链接。它大部分时间都在工作，但有时却找不到模式我的代码看起来像这样（一个更大主体的一部分）：在此类内容中查找架构没有问题： <ix:references> <link:schemaRef xlink:type="simple" xlink:href="https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd" /> &

大家好，我正在使用python 3 beautifulsoup尝试提取链接。它大部分时间都在工作，但有时却找不到模式

我的代码看起来像这样（一个更大主体的一部分）：

在此类内容中查找架构没有问题：

<ix:references>
    <link:schemaRef xlink:type="simple" xlink:href="https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd" />
</ix:references>

但在这些文件中找不到xlink:href:

<references>
    <schemaRef xlink:href="https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd" xlink:type="simple" xmlns="http://www.xbrl.org/2003/linkbase"/>
</references>

我得到的错误是：

AttributeError                            Traceback (most recent call last)
<ipython-input-8-da0992ab9ae8> in <module>
     96 
     97         with open(filename,encoding="utf8") as a:
---> 98             x = Parser(a)
     99             r = json.dumps(x.to_table(), indent=4)
    100             jsondata = json.loads(r)

~\OneDrive\Desktop\parser\core.py in __init__(self, f, raise_on_error)
     21         self.errors = []
     22 
---> 23         self._get_schema()
     24 
     25         self._get_contexts()

~\OneDrive\Desktop\parser\core.py in _get_schema(self)
     47         self.schema = self.soup.find(
     48 
---> 49             ['link:schemaRef', 'schemaRef']).get('xlink:href')
     50 
     51         self.namespaces = {}

AttributeError: 'NoneType' object has no attribute 'get'

AttributeError回溯（最近一次调用）
在里面
96
97打开（filename，encoding=“utf8”）作为：
--->98 x=解析器（a）
99 r=json.dumps（x.to_table（），缩进=4）
100 jsondata=json.loads（r）
~\OneDrive\Desktop\parser\core.py in\uuuuu init\uuuu（self、f、raise\u on\u错误）
21 self.errors=[]
22
--->23 self.\u get\u schema（）
24
25自我。获取上下文（）
~\OneDrive\Desktop\parser\core.py in\u get\u模式（self）
47 self.schema=self.soup.find(
48
--->49['link:schemaRef'，'schemaRef']）。获取（'xlink:href'）
50
51 self.namespace={}
AttributeError:“非类型”对象没有属性“get”

任何帮助都将不胜感激

谢谢。

从您的错误跟踪中，电话线

self.soup.find(['link:schemaRef', 'schemaRef'])

一个也没有。为了防止出现这种情况，您应该在执行

get

之前测试结果，即：

data = self.soup.find(['link:schemaRef', 'schemaRef'])
if data is not None:
    self.schema = data.get('xlink:href')

@dspencer，因此返回正确的模式

from bs4 import BeautifulSoup

with open("F:\ErrorFolder\06647909.html", "r") as f:
    soup = BeautifulSoup(f, 'html.parser')
    resources = soup.find(['ix:references', 'references'])
    #print(resources)
    for s in resources.find_all(['link:schemaRef', 'schemaRef', 'schemaref']):
        x = s.get('xlink:href')
        print(x)

所以我只需要改变一下，看起来真正的问题可能是schemaref和schemaref

非常感谢您的帮助

我宁愿认为它是指您的

self.soup.find（'html'）

是

None

。错误在：['link:schemaRef'，'schemaRef']]处突出显示。get（'xlink:href'）所以我认为它甚至没有达到第二个示例中的程度，在

xlink:type

之前是否有空格？是的，我认为这只是这里的格式错误信息表明@Błotosmętek是正确的。包括完整的回溯，我们会确定的。好的，你知道为什么它找不到schemaRef吗？如果我这样做：resources=self.soup.find（['ix:resources'，'resources']）self.schema=resources.find（['link:schemaRef'，'schemaRef'）。get（'xlink:href'）是，如果您的

soup

对象已经找到

ix:references

，它还应该尝试查找

引用。我建议发布代码的相关部分，以明确问题。
from bs4 import BeautifulSoup

with open("F:\ErrorFolder\06647909.html", "r") as f:
    soup = BeautifulSoup(f, 'html.parser')
    resources = soup.find(['ix:references', 'references'])
    #print(resources)
    for s in resources.find_all(['link:schemaRef', 'schemaRef', 'schemaref']):
        x = s.get('xlink:href')
        print(x)