Python BeautifulSoap在后续页面上不起作用_Python_Beautifulsoup

Python BeautifulSoap在后续页面上不起作用

python

Python BeautifulSoap在后续页面上不起作用,python,beautifulsoup,Python,Beautifulsoup,我无法获得后续页面的标题。问题在哪里 from bs4 import BeautifulSoup import urllib.request # First page source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&a

我无法获得后续页面的标题。问题在哪里

from bs4 import BeautifulSoup
import urllib.request

# First page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=').read()
soup = BeautifulSoup(source,'lxml')
print(soup.title) # shows title as expected

# Second page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=&page=2').read()
soup = BeautifulSoup(source,'lxml')
print(soup.title) # shows None

不确定为什么只有第二个案例失败。正如在其他一些文章中提到的，有时使用其他解析器可能会起作用

我可以让第二页与

html.parser

配合使用。尽管它发出了关于解码错误的警告

from bs4 import BeautifulSoup
import urllib.request

# Second page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=&page=2').read()
soup = BeautifulSoup(source,'html.parser')
print(soup.title) # Now works

输出

Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<title>YENIEMLAK.AZ Satılır Bina ev menzil  </title>

某些字符无法解码，已替换为替换字符。
YENIEMLAK.AZ Satılır Bina ev menzil

这是可行的，如果您切换到使用请求而不是urllib，它也会起作用。