Python BeautifulSoap在后续页面上不起作用
我无法获得后续页面的标题。问题在哪里Python BeautifulSoap在后续页面上不起作用,python,beautifulsoup,Python,Beautifulsoup,我无法获得后续页面的标题。问题在哪里 from bs4 import BeautifulSoup import urllib.request # First page source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&a
from bs4 import BeautifulSoup
import urllib.request
# First page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=').read()
soup = BeautifulSoup(source,'lxml')
print(soup.title) # shows title as expected
# Second page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=&page=2').read()
soup = BeautifulSoup(source,'lxml')
print(soup.title) # shows None
不确定为什么只有第二个案例失败。正如在其他一些文章中提到的,有时使用其他解析器可能会起作用 我可以让第二页与
html.parser
配合使用。尽管它发出了关于解码错误的警告
from bs4 import BeautifulSoup
import urllib.request
# Second page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=&page=2').read()
soup = BeautifulSoup(source,'html.parser')
print(soup.title) # Now works
输出
Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<title>YENIEMLAK.AZ Satılır Bina ev menzil </title>
某些字符无法解码,已替换为替换字符。
YENIEMLAK.AZ Satılır Bina ev menzil
这是可行的,如果您切换到使用请求而不是urllib,它也会起作用。