Python 从布局混乱的网页中获取所有名称时出现问题_Python_Python 3.x_Web Scraping_Beautifulsoup_Python Requests

Python 从布局混乱的网页中获取所有名称时出现问题

python python-3.x web-scraping

Python 从布局混乱的网页中获取所有名称时出现问题,python,python-3.x,web-scraping,beautifulsoup,python-requests,Python,Python 3.x,Web Scraping,Beautifulsoup,Python Requests,我已经编写了一个脚本来解析网页上所有的移动商店名称。当我运行我的脚本时，我会得到其中的一小部分。我如何才能从该页面中获得所有的名字，而该页面目前的姓氏是阿拉巴马州柏威移动家庭公园这就是我迄今为止所尝试的： import requests from bs4 import BeautifulSoup url = "replace with above link" r = requests.get(url) soup = BeautifulSoup(r.text,"lxml") items =

我已经编写了一个脚本来解析网页上所有的移动商店名称。当我运行我的脚本时，我会得到其中的一小部分。我如何才能从该页面中获得所有的名字，而该页面目前的姓氏是阿拉巴马州柏威移动家庭公园

这就是我迄今为止所尝试的：

import requests
from bs4 import BeautifulSoup

url = "replace with above link"

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
items = soup.select_one("table tr")
name = '\n'.join([item.get_text(strip=True) for item in items.select("td p strong") if "alabama" in item.text.lower()])
print(name)

输出如下：

Roberts Trailer Park - Alabama
Cloverleaf Trailer Park - Alabama
Longview Mobile Home Park - Alabama

该页面的html非常差，因此非常难看，但这是可行的：

import requests
from bs4 import BeautifulSoup

url = "http://www.chattelmortgage.net/Alabama_mobile_home_parks.html"

r = requests.get(url)
soup = BeautifulSoup(r.text,"html")
table = soup.find('table', attrs={'class':'tablebg, tableBorder'})
print([item.text.strip()  for item in table.find_all("strong") if "alabama" in item.text.lower()])

该页面的html非常差，因此非常难看，但这是可行的：

import requests
from bs4 import BeautifulSoup

url = "http://www.chattelmortgage.net/Alabama_mobile_home_parks.html"

r = requests.get(url)
soup = BeautifulSoup(r.text,"html")
table = soup.find('table', attrs={'class':'tablebg, tableBorder'})
print([item.text.strip()  for item in table.find_all("strong") if "alabama" in item.text.lower()])

尝试使用

html.parser

而不是

lxml

。另外，不要使用

select_one（'table tr'）

，而是尝试使用

find_all（'strong'）

。您还需要删除额外的空格和回车符

以下代码将返回预期的（491）记录：

尝试使用

html.parser

而不是

lxml

。另外，不要使用

select_one（'table tr'）

，而是尝试使用

find_all（'strong'）

。您还需要删除额外的空格和回车符

以下代码将返回预期的（491）记录：

你能给我们url让我们自己试试吗？看上面找到已经存在的链接。对不起，我没有看到。你能给我们url让我们自己试试吗？看上面找到已经存在的链接。对不起，我没有看到