Python 如何使用美丽的汤找到所有刮只名单，这是身体的一部分_Python_Web Scraping_Beautifulsoup

Python 如何使用美丽的汤找到所有刮只名单，这是身体的一部分

python web-scraping

Python 如何使用美丽的汤找到所有刮只名单，这是身体的一部分,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我很难用“靓汤”将这个维基百科列表与洛杉矶的邻居们一块儿删除。我得到了身体的所有内容，而不仅仅是我想要的邻里名单。我看到了很多关于如何刮表的内容，但是我在如何在这种情况下应用表逻辑方面被绊住了。这是我一直在使用的代码： import BeautifulSoup address = 'Los Angeles, United States' url = "https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_L

我很难用“靓汤”将这个维基百科列表与洛杉矶的邻居们一块儿删除。我得到了身体的所有内容，而不仅仅是我想要的邻里名单。我看到了很多关于如何刮表的内容，但是我在如何在这种情况下应用表逻辑方面被绊住了。这是我一直在使用的代码：

import BeautifulSoup

address = 'Los Angeles, United States'

url = "https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_Los_Angeles"

source = requests.get(url).text

soup = BeautifulSoup(source,'lxml')

neighborhoodList = []

-- append the data into the list

for row in soup.find_all("div", class_="mw-body")[0].findAll("li"):

   neighborhoodList.append(row.text.replace(', LA',''))

df_neighborhood = pd.DataFrame({"Neighborhood": neighborhoodList})

如果查看页面源，则邻域条目位于具有“div col”类的div中，并且链接包含“title”属性

此外，在追加过程中，似乎不需要替换文本

以下代码：

导入请求
从bs4导入BeautifulSoup
作为pd进口熊猫
地址='美国洛杉矶'
url=”https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_Los_Angeles"
source=请求.get（url）.text
汤=BeautifulSoup（来源“lxml”）
邻里列表=[]
#--将数据追加到列表中
链接=[]
对于汤中的行。查找所有（“div”，class=“div col”）：
对于第行中的项目，选择（“a”）：
如果项有属性（“标题”）：
邻里关系列表.append（item.text）
df_neighbory=pd.DataFrame（{“neighbory”：neighboryList}）
打印（f'前10行：'）
打印（df_邻域头（n=10））
打印（f'\n最后10行：'）
打印（df_邻域尾部（n=10））

结果:

First 10 Rows:
             Neighborhood
0        Angelino Heights
1                  Arleta
2       Arlington Heights
3           Arts District
4         Atwater Village
5           Baldwin Hills
6  Baldwin Hills/Crenshaw
7         Baldwin Village
8           Baldwin Vista
9        Beachwood Canyon

Last 10 Rows:
           Neighborhood
186    Westwood Village
187     Whitley Heights
188  Wholesale District
189          Wilmington
190     Wilshire Center
191       Wilshire Park
192      Windsor Square
193            Winnetka
194      Woodland Hills
195      Yucca Corridor

你在正确的轨道上，我看到有一些术语像

A-Z

，这可能不是你在

df

中所需要的，你可以用它来代替：

用于shop中的行。find_all（“div”，class=“mw body”）[0]。findAll（“A”，attrs={“title”：re.compile（，Los Angeles”）：

，然后在for循环中，使用

邻里关系列表.append（row.text）

。在尝试此操作之前，请记住导入re。