Python 3.x Python-如何在给定页面中使用位置对文本进行web抓取_Python 3.x_Web Scraping

Python 3.x Python-如何在给定页面中使用位置对文本进行web抓取

python-3.x web-scraping

Python 3.x Python-如何在给定页面中使用位置对文本进行web抓取,python-3.x,web-scraping,Python 3.x,Web Scraping,我正试图从下面代码中给出的url中获取“Katowice，Brynów-Zgrzebnioka，Brynów” import bs4 from urllib.request import urlopen as Open from urllib.request import Request from bs4 import BeautifulSoup as soup headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit

我正试图从下面代码中给出的url中获取“Katowice，Brynów-Zgrzebnioka，Brynów”

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")

print(page_soup.find("a", {"href":"#map"}).text)

到目前为止我能到达

.css-14dmk7z-Le{margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}.css-1g0gx4e-Le{vertical-align:middle;fill:currentColor;margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}Katowice, Brynów-Zgrzebnioka, Brynów

我不知道如何进一步进行，任何帮助都将不胜感激

不确定这是否能100%解决您的问题，但这是我的解决方案

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")
texts = page_soup.findAll(text=True)

print(texts[91])

欢迎来到SO

下面是一段代码（您只是目标的一部分），它返回您要查找的字符串。基本上，我只是拆分了您已经在“}”元素上返回的长字符串，并获取其余部分

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")

maptag = page_soup.find("a", {"href":"#map"}).text
print(maptag.split("}")[2])

然而，我要强调的是，这个解决方案是1）危险的，因为它对示例页面非常特殊，可能不适用于其他页面，2）非pythonic。您可能想在页面中对address元素进行一些处理，以获得更好的结果

非常感谢您的回复成熟-您是否知道一种更通用的解决方案，并且适用于位置前面可能有更多/更少字符的其他链接？非常感谢您的回复，我已经检查了网站上的其他几个网站，它们似乎遵循类似的布局，所以您的解决方案现在应该可以。再次感谢！