Python 数据不存在';I don’我不来答复你的请求
我不熟悉python、数据清理和自动化。我正试图抓取Python 数据不存在';I don’我不来答复你的请求,python,beautifulsoup,python-requests,Python,Beautifulsoup,Python Requests,我不熟悉python、数据清理和自动化。我正试图抓取URL中给出的网站。当我在浏览器中打开URL链接时,所有数据都会显示出来,但是请求的响应。get()方法不会给出这些数据 如果有人能告诉我出了什么问题,那将非常有帮助 import requests import time from bs4 import BeautifulSoup URL = "https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814
URL
中给出的网站。当我在浏览器中打开URL
链接时,所有数据都会显示出来,但是请求的响应。get()
方法不会给出这些数据
如果有人能告诉我出了什么问题,那将非常有帮助
import requests
import time
from bs4 import BeautifulSoup
URL = "https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814074&patentNumber=7871455"
html = requests.get(URL)
time.sleep(4)
pno = response.findAll('div',{"class":"left maintenanceFeeDetails"})
print(pno)
我要刮取的数据处于付款窗口状态(只需在浏览器中粘贴
url
中的url)我尝试了allow\u redirects=True
和headers
参数,但仍然注意到:
URL = "https://fees.uspto.gov/MaintenanceFees/fees/details?applicationNumber=12814074&patentNumber=7871455"
headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
response = requests.get(URL, headers=headers, allow_redirects=True)
soup = BeautifulSoup(response.text)
print(response.history)
divs = soup.find_all('div', class_='left maintenanceFeeDetails')
print(divs)
它遵循重定向,但我什么也得不到
[<Response [302]>, <Response [302]>, <Response [302]>]
[]
结果(可从中提取数据的表的标题)
根据我的评论,您需要的数据是动态生成的,因此它不在您返回的源中,请求会自动为您处理get请求的重定向,因此这也永远不会成为问题: 您可以通过使用一个简单的get request to
模仿ajax请求来获得所需的信息https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details
使用相同的参数:
params = {"patentNumber": "7871455",
"applicationNumber": "12814074"}
api = "https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details"
data = requests.get(api, params=params).json()
它以json格式提供所有信息
In [1]: import requests
In [2]: params = {"patentNumber": "7871455",
...: "applicationNumber": "12814074"}
In [3]: api = "https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details"
In [4]: data = requests.get(api, params=params).json()
In [5]: data["infoMessageText"]
Out[5]: [u'No maintenance fees are due at this time. 7.5 year window opens on 01/18/2018.']
In [6]: info = data["model"][0]
In [7]: info.keys()
Out[7]:
[u'patentStatus',
u'feeStatus',
u'geoRegionCode',
u'category',
u'patentNumber',
u'subCategory',
u'streetLineTwo',
u'applicationNumber',
u'applicationStatusDate',
u'abandonmentDate',
u'nationalStageIndicator',
u'window',
u'version',
u'postalCode',
u'nameLineOne',
u'issueDate',
u'maintenanceFeePhases',
u'streetLineOne',
u'filingDate',
u'countryName',
u'phone',
u'correspondenceAddressIndicator',
u'entityTypeName',
u'nameLineTwo',
u'applicationStatus',
u'entityTypeCd',
u'cityName',
u'feeCodes',
u'patentTitle',
u'customerNumber',
u'windowStatus']
In [8]: info["patentStatus"]
Out[8]: u'ACTIVE'
In [9]: info["feeStatus"]
Out[9]: u'Not Due'
In [10]: info
Out[10]:
{u'abandonmentDate': -62135578800000,
u'applicationNumber': u'12814074',
u'applicationStatus': 150,
u'applicationStatusDate': 1293512400000,
u'category': u'UTL',
u'cityName': u'LOS ANGELES',
u'correspondenceAddressIndicator': True,
u'countryName': u'UNITED STATES',
u'customerNumber': u'33417',
u'entityTypeCd': u'S',
u'entityTypeName': u'SMALL',
u'feeCodes': [],
u'feeStatus': u'Not Due',
u'filingDate': 1276228800000,
u'geoRegionCode': u'CA',
u'issueDate': 1295326800000,
u'maintenanceFeePhases': [{u'closeDate': 1421730000000,
u'expiredDate': 1421816400000,
u'feeStatus': u'Paid',
u'openDate': 1390021200000,
u'statementStatus': u'Statement',
u'surchargeDate': 1405742400000,
u'transactionId': u'020314INTMTFEE00001905503725',
u'version': 0,
u'window': u'3.5',
u'windowStatus': u'Closed'},
{u'closeDate': 1547787600000,
u'expiredDate': 1547874000000,
u'feeStatus': u'Not Due',
u'openDate': 1516251600000,
u'statementStatus': None,
u'surchargeDate': 1531972800000,
u'transactionId': None,
u'version': 0,
u'window': u'7.5',
u'windowStatus': u'Not Open'},
{u'closeDate': 1674018000000,
u'expiredDate': 1674104400000,
u'feeStatus': u'Not Due',
u'openDate': 1642482000000,
u'statementStatus': None,
u'surchargeDate': 1658203200000,
u'transactionId': None,
u'version': 0,
u'window': u'11.5',
u'windowStatus': u'Not Open'}],
u'nameLineOne': u'LEWIS, BRISBOIS, BISGAARD & SMITH LLP',
u'nameLineTwo': u'JON E HOKANSON',
u'nationalStageIndicator': u'N',
u'patentNumber': u'7871455',
u'patentStatus': u'ACTIVE',
u'patentTitle': u'JET ENGINE PROTECTION SYSTEM',
u'phone': u'2132501800',
u'postalCode': u'90071',
u'streetLineOne': u'633 WEST 5TH STREET',
u'streetLineTwo': u'SUITE 4000',
u'subCategory': None,
u'version': 0,
u'window': u'7.5',
u'windowStatus': u'Not Open'}
您可以从模型目录中提取任何需要的信息。URL返回302。您必须遵循
位置
标题中给出的URL。响应
应该是什么?另外,睡眠是没有意义的,请求不会呈现任何动态内容,任何302都由请求处理,因此您可以获得源代码。真正的问题是你想要什么,非常感谢。这对我帮助很大+1 API可用时总是比使用BeautifulSoup或Selenium更可靠、更快。非常感谢,它真的很有帮助我下次会记得使用API的谢谢
params = {"patentNumber": "7871455",
"applicationNumber": "12814074"}
api = "https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details"
data = requests.get(api, params=params).json()
In [1]: import requests
In [2]: params = {"patentNumber": "7871455",
...: "applicationNumber": "12814074"}
In [3]: api = "https://fees.uspto.gov/mntfee-services/v1/maintenancefee/details"
In [4]: data = requests.get(api, params=params).json()
In [5]: data["infoMessageText"]
Out[5]: [u'No maintenance fees are due at this time. 7.5 year window opens on 01/18/2018.']
In [6]: info = data["model"][0]
In [7]: info.keys()
Out[7]:
[u'patentStatus',
u'feeStatus',
u'geoRegionCode',
u'category',
u'patentNumber',
u'subCategory',
u'streetLineTwo',
u'applicationNumber',
u'applicationStatusDate',
u'abandonmentDate',
u'nationalStageIndicator',
u'window',
u'version',
u'postalCode',
u'nameLineOne',
u'issueDate',
u'maintenanceFeePhases',
u'streetLineOne',
u'filingDate',
u'countryName',
u'phone',
u'correspondenceAddressIndicator',
u'entityTypeName',
u'nameLineTwo',
u'applicationStatus',
u'entityTypeCd',
u'cityName',
u'feeCodes',
u'patentTitle',
u'customerNumber',
u'windowStatus']
In [8]: info["patentStatus"]
Out[8]: u'ACTIVE'
In [9]: info["feeStatus"]
Out[9]: u'Not Due'
In [10]: info
Out[10]:
{u'abandonmentDate': -62135578800000,
u'applicationNumber': u'12814074',
u'applicationStatus': 150,
u'applicationStatusDate': 1293512400000,
u'category': u'UTL',
u'cityName': u'LOS ANGELES',
u'correspondenceAddressIndicator': True,
u'countryName': u'UNITED STATES',
u'customerNumber': u'33417',
u'entityTypeCd': u'S',
u'entityTypeName': u'SMALL',
u'feeCodes': [],
u'feeStatus': u'Not Due',
u'filingDate': 1276228800000,
u'geoRegionCode': u'CA',
u'issueDate': 1295326800000,
u'maintenanceFeePhases': [{u'closeDate': 1421730000000,
u'expiredDate': 1421816400000,
u'feeStatus': u'Paid',
u'openDate': 1390021200000,
u'statementStatus': u'Statement',
u'surchargeDate': 1405742400000,
u'transactionId': u'020314INTMTFEE00001905503725',
u'version': 0,
u'window': u'3.5',
u'windowStatus': u'Closed'},
{u'closeDate': 1547787600000,
u'expiredDate': 1547874000000,
u'feeStatus': u'Not Due',
u'openDate': 1516251600000,
u'statementStatus': None,
u'surchargeDate': 1531972800000,
u'transactionId': None,
u'version': 0,
u'window': u'7.5',
u'windowStatus': u'Not Open'},
{u'closeDate': 1674018000000,
u'expiredDate': 1674104400000,
u'feeStatus': u'Not Due',
u'openDate': 1642482000000,
u'statementStatus': None,
u'surchargeDate': 1658203200000,
u'transactionId': None,
u'version': 0,
u'window': u'11.5',
u'windowStatus': u'Not Open'}],
u'nameLineOne': u'LEWIS, BRISBOIS, BISGAARD & SMITH LLP',
u'nameLineTwo': u'JON E HOKANSON',
u'nationalStageIndicator': u'N',
u'patentNumber': u'7871455',
u'patentStatus': u'ACTIVE',
u'patentTitle': u'JET ENGINE PROTECTION SYSTEM',
u'phone': u'2132501800',
u'postalCode': u'90071',
u'streetLineOne': u'633 WEST 5TH STREET',
u'streetLineTwo': u'SUITE 4000',
u'subCategory': None,
u'version': 0,
u'window': u'7.5',
u'windowStatus': u'Not Open'}