Python 网页抓取弹出窗口

Python 网页抓取弹出窗口,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我是新的网络抓取,我正试图自动检索包裹信息从一个城镇网站。我有300多个包裹需要这本书和页码 这是网站: 当你去那里时,你可以点击搜索,然后我会输入标识符(例如68/20)。我有所有这些的清单。从那里的个人资料来了,我可以得到书和页码 这就是我目前所拥有的 from bs4 import BeautifulSoup from urllib.request import urlopen url = "https://newmilfordct.mapgeo.io/datasets/pr

我是新的网络抓取,我正试图自动检索包裹信息从一个城镇网站。我有300多个包裹需要这本书和页码

这是网站:

当你去那里时,你可以点击搜索,然后我会输入标识符(例如68/20)。我有所有这些的清单。从那里的个人资料来了,我可以得到书和页码

这就是我目前所拥有的

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = "https://newmilfordct.mapgeo.io/datasets/properties?abuttersDistance=100&latlng=41.587864%2C-  73.425014"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
我连接到该网站,但我不知道如何与之互动。
如果有人能在正确的方向上帮助我,我们将不胜感激,并且可以手动节省工作时间。

您可以通过向
API
url发送
POST
请求来获取给定标识符的数据

以下是如何做到这一点:

import requests

search_url = "https://newmilfordct.mapgeo.io/api/datasets/properties/search?format=json"

identifier = "68/20"

payload = {
    "page": 1,
    "quickSearch": identifier
}

search_results = requests.post(search_url, payload).json()
# print(search_results)

for item in search_results:
    name = item['displayName']
    owner = item['ownerName']
    geometry = item['geometry']
    book = item['lastSaleBook']
    page = item['lastSalePage']
    print(f"Name: {name} | Owner: {owner}")
    print(f"Book/Page: {book}/{page}")
    print(geometry)
    print("-" * 80)
输出:

Name: 17 BUCKINGHAM LN | Owner: ROTELLI LOUIS
Book/Page: 0970/230
{"type":"Polygon","coordinates":[[[-73.4909038060549,41.6425898231357],[-73.4909821900848,41.6425591025291],[-73.4907493168393,41.6419510845828],[-73.4911769908149,41.6420353877],[-73.4915429751214,41.6418889484739],[-73.4915515509607,41.6418998161938],[-73.4919447199921,41.6423992451082],[-73.4920405021311,41.6425204818934],[-73.4919930203487,41.6425307775562],[-73.4919273071398,41.6425305146988],[-73.4917614178846,41.642552550643],[-73.491595684262,41.642581803258],[-73.4910018358319,41.6426901884681],[-73.4910019510053,41.6427258656192],[-73.4909038060549,41.6425898231357]]]}
--------------------------------------------------------------------------------
Name: 15 BUCKINGHAM LN | Owner: NEELANDS DOUGLAS S + SALOME S
Book/Page: 0330/394
{"type":"Polygon","coordinates":[[[-73.4904204439222,41.6413365201908],[-73.4908759926496,41.6411167792846],[-73.4909181970441,41.6410961714263],[-73.4915429751214,41.6418889484739],[-73.4911769908149,41.6420353877],[-73.4907493168393,41.6419510845828],[-73.4909821900848,41.6425591025291],[-73.4909038060549,41.6425898231357],[-73.4904204439222,41.6413365201908]]]}
--------------------------------------------------------------------------------
JSON中还有更多内容。只需取消注释这一行
#print(search_results)
,即可获得整个响应

编辑:关于
API
的简短说明

当您将标识符放入web浏览器中开发人员工具的搜索字段时,您可以偷偷地看一看发生了什么。然后转到
网络
选项卡并选择
XHR
过滤器


选择第一项并选择
标题
。您将在那里找到
请求URL
请求有效负载

所需的输出是什么?这似乎是一个硒元素。因为您需要单击/与网站交互。这非常有效,非常感谢!如果你不介意我问你在哪里找到api的话。我查看了网站上的所有源代码,没有看到任何引用该链接的内容。谢谢您提供的信息。这是我第一次这样做,直到现在我才明白它的功能。