Python 3.x Python-如何在给定页面中使用位置对文本进行web抓取

Python 3.x Python-如何在给定页面中使用位置对文本进行web抓取,python-3.x,web-scraping,Python 3.x,Web Scraping,我正试图从下面代码中给出的url中获取“Katowice,Brynów-Zgrzebnioka,Brynów” import bs4 from urllib.request import urlopen as Open from urllib.request import Request from bs4 import BeautifulSoup as soup headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit

我正试图从下面代码中给出的url中获取“Katowice,Brynów-Zgrzebnioka,Brynów

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")

print(page_soup.find("a", {"href":"#map"}).text)    
到目前为止我能到达

.css-14dmk7z-Le{margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}.css-1g0gx4e-Le{vertical-align:middle;fill:currentColor;margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}Katowice, Brynów-Zgrzebnioka, Brynów

我不知道如何进一步进行,任何帮助都将不胜感激

不确定这是否能100%解决您的问题,但这是我的解决方案

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")
texts = page_soup.findAll(text=True)

print(texts[91])
欢迎来到SO

下面是一段代码(您只是目标的一部分),它返回您要查找的字符串。基本上,我只是拆分了您已经在“}”元素上返回的长字符串,并获取其余部分

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")

maptag = page_soup.find("a", {"href":"#map"}).text
print(maptag.split("}")[2])

然而,我要强调的是,这个解决方案是1)危险的,因为它对示例页面非常特殊,可能不适用于其他页面,2)非pythonic。您可能想在页面中对address元素进行一些处理,以获得更好的结果

非常感谢您的回复成熟-您是否知道一种更通用的解决方案,并且适用于位置前面可能有更多/更少字符的其他链接?非常感谢您的回复,我已经检查了网站上的其他几个网站,它们似乎遵循类似的布局,所以您的解决方案现在应该可以。再次感谢!