Python 查找网站上存在soup.findall unicode问题的页面数_Python_String_Unicode_Beautifulsoup_Findall

Python 查找网站上存在soup.findall unicode问题的页面数

python string unicode

Python 查找网站上存在soup.findall unicode问题的页面数,python,string,unicode,beautifulsoup,findall,Python,String,Unicode,Beautifulsoup,Findall,嗨，我正在尝试使用Python 2.7和Beautifulsoup查找网站上的页面数。我尝试使用此代码从分页行获取页数 #!/usr/bin/env python # -*- coding: utf-8 -*- import urllib2 from bs4 import BeautifulSoup headers = {'User-Agent': 'Mozilla/5.0'} req = urllib2.Request("https://www.sikayetvar.com", None,h

嗨，我正在尝试使用Python 2.7和Beautifulsoup查找网站上的页面数。我尝试使用此代码从分页行获取页数

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2

from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request("https://www.sikayetvar.com", None,headers)
resp  = urllib2.urlopen(req)
html = resp.read()
soup = BeautifulSoup(html)
pages = soup.find_all('div', attrs = {'class' : 'pagination row'})
for page in pages:
   print page.text

输出如下： 1. 2. 3. 4. 5. 6. 7. ... 807

我只需要807这个数字，但是soup.findall接收到的是unicode，我尝试了这个类型。我想我应该把它转换成一个字符串并找到max number，在这种情况下（…）会产生问题，还是应该尝试找到findall的最后一个元素，但这不是一个列表，而是unicode。我真的需要帮助，谢谢

我无法安装urllib。因此，我将使用

请求

库。您可以使用

pip安装请求安装它

import requests 
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}

response = requests.get("https://www.sikayetvar.com/a101", headers = headers)

soup = BeautifulSoup(response.text,'lxml')

#This will you all a tags in div that has pagination class
pages = soup.select('div.pagination a')

#Last element is next page. The previous is your last page number.
#So we are going to take second last item

print(pages[-2].text)
#Output is 807

int（pages[-1].text）

给了你什么？@JonClements就在我写的pages下面，s=int（pages[-1].text）打印s，它说：ValueError:int（）的文本无效，以10为基数：“1234567…807”@Selçuk，

urllib[2]

是标准库的一部分，你不需要安装它。