Python BeautifulSoup:为什么我会收到一个内部服务器错误?
我想刮一刮页面上的桌子 我写了这段代码:Python BeautifulSoup:为什么我会收到一个内部服务器错误?,python,beautifulsoup,python-requests,urllib,Python,Beautifulsoup,Python Requests,Urllib,我想刮一刮页面上的桌子 我写了这段代码: import urllib from urllib.request import urlopen from bs4 import BeautifulSoup import sys import requests import pandas as pd webpage = 'https://web.iitm.ac.in/bioinfo2/cpad2/peptides/?page=1' page = urllib.request.urlopen(webpa
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import sys
import requests
import pandas as pd
webpage = 'https://web.iitm.ac.in/bioinfo2/cpad2/peptides/?page=1'
page = urllib.request.urlopen(webpage)
soup = BeautifulSoup(page,'html.parser')
soup_text = soup.get_text()
print(soup)
输出是一个错误:
Traceback (most recent call last):
File "scrape_cpad.py", line 9, in <module>
page = urllib.request.urlopen(webpage)
File "/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
回溯(最近一次呼叫最后一次):
文件“scrape_cpad.py”,第9行,在
page=urllib.request.urlopen(网页)
urlopen中的文件“/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py”,第223行
返回opener.open(url、数据、超时)
打开文件“/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py”,第532行
响应=方法(请求,响应)
http_响应中的文件“/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py”,第642行
“http”、请求、响应、代码、消息、hdrs)
文件“/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py”,第570行出错
返回自我。调用链(*args)
文件“/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py”,第504行,在调用链中
结果=func(*args)
文件“/Users/kela/anaconda/envs/py3/lib/python3.6/urllib/request.py”,第650行,默认为http\u error\u
raise HTTPError(请求完整的url、代码、消息、hdrs、fp)
urllib.error.HTTPError:HTTP错误500:内部服务器错误
我在两台不同的计算机和网络上试过。而且,我可以看到服务器正在运行,因为我可以通过HTML访问页面,还可以查看页面的源代码
我还尝试将URL从https更改为http或www
有人能告诉我什么是工作代码,能够连接到这个页面来下拉表吗
p、 我看到有类似的问题,例如和,但没有一个能回答我的问题。
soup=BeautifulSoup(page,'html.parser')。context
似乎服务器拒绝了没有适当的用户代理
头的请求
我尝试将用户代理设置为我的浏览器,并设法使其响应一个HTML页面:
webpage = 'https://web.iitm.ac.in/bioinfo2/cpad2/peptides/?page=1'
req = urllib.request.Request(webpage)
# spoof the UA header
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0)
Gecko/20100101 Firefox/77.0')
page = urllib.request.urlopen(req)
使用
请求
模块抓取页面
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://web.iitm.ac.in/bioinfo2/cpad2/peptides/?page=1'
soup = BeautifulSoup(requests.get(url).content ,'html.parser')
for tr in soup.select('tr[data-toggle="modal"]'):
print(tr.get_text(strip=True, separator=' '))
print('-' * 120)
印刷品:
P-0001 GYE 3 Amyloid Amyloid-beta precursor protein (APP) P05067 No Org Lett. 2008 Jul 3;10(13):2625-8. 18529009 CPAD
------------------------------------------------------------------------------------------------------------------------
P-0002 KFFE 4 Amyloid J Biol Chem. 2002 Nov 8;277(45):43243-6. 12215440 CPAD
------------------------------------------------------------------------------------------------------------------------
P-0003 KVVE 4 Amyloid J Biol Chem. 2002 Nov 8;277(45):43243-6. 12215440 CPAD
------------------------------------------------------------------------------------------------------------------------
P-0004 NNQQ 4 Amyloid Eukaryotic peptide chain release factor GTP-binding subunit (ERF-3) P05453 Nature. 2007 May 24;447(7143):453-7. 17468747 CPAD
------------------------------------------------------------------------------------------------------------------------
P-0005 VKSE 4 Non-amyloid Microtubule-associated protein tau (PHF-tau) P10636 Proc Natl Acad Sci U S A. 2000 May 9;97(10):5129-34. 10805776 AmyLoad
------------------------------------------------------------------------------------------------------------------------
P-0006 AILSS 5 Amyloid Islet amyloid polypeptide (Amylin) P10997 No Proc Natl Acad Sci U S A. 1990 Jul;87(13):5036-40. 2195544 CPAD
------------------------------------------------------------------------------------------------------------------------
...and so on.
这并不能回答这个问题。一旦你有足够的钱,你将能够;相反-