获取href链接&;python循环中的文本
我需要从苹果商店中获取信息,我有一个hashmap获取href链接&;python循环中的文本,python,beautifulsoup,Python,Beautifulsoup,我需要从苹果商店中获取信息,我有一个hashmaphashmap\u-genre\u-link和一个URL({'Games':'';…}),我想为每个键创建另一个hashmap,iOS应用程序(文本)和应用程序URL作为值:Games\u-apps:{'Pokemon Go','':…} 这是我的密码: from bs4 import BeautifulSoup from requests import get links = [] ios_categories_links=[] hashm
hashmap\u-genre\u-link
和一个URL({'Games':'';…}),我想为每个键创建另一个hashmap,iOS应用程序(文本)和应用程序URL作为值:Games\u-apps:{'Pokemon Go','':…}
这是我的密码:
from bs4 import BeautifulSoup
from requests import get
links = []
ios_categories_links=[]
hashmap_genre_link ={}
url = "https://itunes.apple.com/US/genre/ios/id36"
response = get(url)
html_soup = BeautifulSoup(response.text,"html.parser")
categories_class = html_soup.find_all('div',class_="grid3-column")
# cat = categories_class.text
href = html_soup.find_all('a', href=True)
for j in href:
# print(j['href'])
links.append(j['href'])
#
# Hasmap initialisation : hashmap_genre_link = {"games" : "https://link_for_games_page"; etc...}
for i in links:
if "https://itunes.apple.com/us/genre/ios" in i:
genre = i.split("/")[5][4:] #We get the genre, without 'ios-'
hashmap_genre_link[genre] = i
ios_categories_links.append(i)
#print(hashmap_genre_link)
for the_key, the_value in hashmap_genre_link.items():
#print(the_key, 'corresponds to', the_value)
print("=======================")
print(the_key)
response_genre_link = get(the_value)
html_soup_genre_link = BeautifulSoup(response_genre_link.text,"html.parser")
genre_popular_apps_class = html_soup_genre_link.find_all('div',class_="grid3-column")
for x in genre_popular_apps_class:
print(x['href'])
以下是输出的一部分:
=======================
games-family
<div class="grid3-column" id="selectedcontent">
<div class="column first">
<ul>
<li><a href="https://itunes.apple.com/us/app/trivia-crack/id651510680?mt=8">Trivia Crack</a> </li>
<li><a href="https://itunes.apple.com/us/app/minion-rush/id596402997?mt=8">Minion Rush</a> </li>
<li><a href="https://itunes.apple.com/us/app/draw-something-classic/id488628250?mt=8">Draw Something Classic</a> </li>
=======================
游戏家族
如何在值中获取href标记。(对于我知道我可以使用的文本,.text您使用
['href']
获取这些属性值的想法是正确的。但是,您需要隔离这些属性值。您的x
元素包含所有带有
标记的href。因此,您需要执行额外的x.find\u all('a'))
,然后遍历这些标签,并打印每个标签的href
属性
所以我补充说:
for x in genre_popular_apps_class:
alpha = x.find_all('a')
for beta in alpha:
print (beta['href'])
完整代码:
from bs4 import BeautifulSoup
from requests import get
links = []
ios_categories_links=[]
hashmap_genre_link ={}
url = "https://itunes.apple.com/US/genre/ios/id36"
response = get(url)
html_soup = BeautifulSoup(response.text,"html.parser")
categories_class = html_soup.find_all('div',class_="grid3-column")
# cat = categories_class.text
href = html_soup.find_all('a', href=True)
for j in href:
# print(j['href'])
links.append(j['href'])
#
# Hasmap initialisation : hashmap_genre_link = {"games" : "https://link_for_games_page"; etc...}
for i in links:
if "https://itunes.apple.com/us/genre/ios" in i:
genre = i.split("/")[5][4:] #We get the genre, without 'ios-'
hashmap_genre_link[genre] = i
ios_categories_links.append(i)
#print(hashmap_genre_link)
results_dict = {}
for the_key, the_value in hashmap_genre_link.items():
#print(the_key, 'corresponds to', the_value)
print("=======================")
print(the_key)
response_genre_link = get(the_value)
html_soup_genre_link = BeautifulSoup(response_genre_link.text,"html.parser")
genre_popular_apps_class = html_soup_genre_link.find_all('div',class_="grid3-column")
for x in genre_popular_apps_class:
alpha = x.find_all('a')
links = [ beta['href'] for beta in alpha ]
results_dict[the_key] = links
....
=======================
games-racing
https://itunes.apple.com/us/app/bike-race-free-style-games/id510461758?mt=8
https://itunes.apple.com/us/app/hill-climb-racing/id564540143?mt=8
https://itunes.apple.com/us/app/csr-racing/id469369175?mt=8
https://itunes.apple.com/us/app/real-racing-3/id556164008?mt=8
https://itunes.apple.com/us/app/asphalt-8-airborne/id610391947?mt=8
https://itunes.apple.com/us/app/csr-racing-2/id887947640?mt=8
https://itunes.apple.com/us/app/smashy-road-wanted/id1020119327?mt=8
https://itunes.apple.com/us/app/happy-wheels/id648668184?mt=8
https://itunes.apple.com/us/app/angry-birds-go/id642821482?mt=8
https://itunes.apple.com/us/app/need-for-speed-no-limits/id883393043?mt=8
...
输出:
from bs4 import BeautifulSoup
from requests import get
links = []
ios_categories_links=[]
hashmap_genre_link ={}
url = "https://itunes.apple.com/US/genre/ios/id36"
response = get(url)
html_soup = BeautifulSoup(response.text,"html.parser")
categories_class = html_soup.find_all('div',class_="grid3-column")
# cat = categories_class.text
href = html_soup.find_all('a', href=True)
for j in href:
# print(j['href'])
links.append(j['href'])
#
# Hasmap initialisation : hashmap_genre_link = {"games" : "https://link_for_games_page"; etc...}
for i in links:
if "https://itunes.apple.com/us/genre/ios" in i:
genre = i.split("/")[5][4:] #We get the genre, without 'ios-'
hashmap_genre_link[genre] = i
ios_categories_links.append(i)
#print(hashmap_genre_link)
results_dict = {}
for the_key, the_value in hashmap_genre_link.items():
#print(the_key, 'corresponds to', the_value)
print("=======================")
print(the_key)
response_genre_link = get(the_value)
html_soup_genre_link = BeautifulSoup(response_genre_link.text,"html.parser")
genre_popular_apps_class = html_soup_genre_link.find_all('div',class_="grid3-column")
for x in genre_popular_apps_class:
alpha = x.find_all('a')
links = [ beta['href'] for beta in alpha ]
results_dict[the_key] = links
....
=======================
games-racing
https://itunes.apple.com/us/app/bike-race-free-style-games/id510461758?mt=8
https://itunes.apple.com/us/app/hill-climb-racing/id564540143?mt=8
https://itunes.apple.com/us/app/csr-racing/id469369175?mt=8
https://itunes.apple.com/us/app/real-racing-3/id556164008?mt=8
https://itunes.apple.com/us/app/asphalt-8-airborne/id610391947?mt=8
https://itunes.apple.com/us/app/csr-racing-2/id887947640?mt=8
https://itunes.apple.com/us/app/smashy-road-wanted/id1020119327?mt=8
https://itunes.apple.com/us/app/happy-wheels/id648668184?mt=8
https://itunes.apple.com/us/app/angry-birds-go/id642821482?mt=8
https://itunes.apple.com/us/app/need-for-speed-no-limits/id883393043?mt=8
...
太好了,谢谢!你知道我如何迭代为每种类型创建字典吗:流行应用程序、游戏、赛车等等……是的。你将初始化一个空的dict,然后在迭代时,在其中创建该键和值。当我有机会时,我会将其添加到代码中。因此,你希望类似类型的内容作为键,然后链接列表作为值e?是的,正是我想要的need@userHG刚刚更新。我删除了打印行,但是如果你需要的话,你可以随时将其添加回