Python 3.x Python-如何在给定页面中使用位置对文本进行web抓取
我正试图从下面代码中给出的url中获取“Katowice,Brynów-Zgrzebnioka,Brynów”Python 3.x Python-如何在给定页面中使用位置对文本进行web抓取,python-3.x,web-scraping,Python 3.x,Web Scraping,我正试图从下面代码中给出的url中获取“Katowice,Brynów-Zgrzebnioka,Brynów” import bs4 from urllib.request import urlopen as Open from urllib.request import Request from bs4 import BeautifulSoup as soup headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit
import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers)
html = Open(req).read()
page_soup = soup(html, "html.parser")
print(page_soup.find("a", {"href":"#map"}).text)
到目前为止我能到达
.css-14dmk7z-Le{margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}.css-1g0gx4e-Le{vertical-align:middle;fill:currentColor;margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}Katowice, Brynów-Zgrzebnioka, Brynów
我不知道如何进一步进行,任何帮助都将不胜感激不确定这是否能100%解决您的问题,但这是我的解决方案
import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers)
html = Open(req).read()
page_soup = soup(html, "html.parser")
texts = page_soup.findAll(text=True)
print(texts[91])
欢迎来到SO
下面是一段代码(您只是目标的一部分),它返回您要查找的字符串。基本上,我只是拆分了您已经在“}”元素上返回的长字符串,并获取其余部分
import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers)
html = Open(req).read()
page_soup = soup(html, "html.parser")
maptag = page_soup.find("a", {"href":"#map"}).text
print(maptag.split("}")[2])
然而,我要强调的是,这个解决方案是1)危险的,因为它对示例页面非常特殊,可能不适用于其他页面,2)非pythonic。您可能想在页面中对address元素进行一些处理,以获得更好的结果非常感谢您的回复成熟-您是否知道一种更通用的解决方案,并且适用于位置前面可能有更多/更少字符的其他链接?非常感谢您的回复,我已经检查了网站上的其他几个网站,它们似乎遵循类似的布局,所以您的解决方案现在应该可以。再次感谢!