Python 从网站中提取链接和项目名称并打印这些列表

Python 从网站中提取链接和项目名称并打印这些列表,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我是python编程的初学者,正在使用bs4模块在python中练习web抓取。 我正试图从如下所示的网站中提取一些信息 每个显示的列表都显示为空。请告诉我哪里做错了 import requests from bs4 import BeautifulSoup as bs res = requests.get('https://www.flipkart.com/samsung-mobile-store?otracker=nmenu_sub_Electronics_0_Samsung')

我是python编程的初学者,正在使用
bs4
模块在python中练习web抓取。 我正试图从如下所示的网站中提取一些信息

每个显示的列表都显示为空。请告诉我哪里做错了

import requests
from bs4 import BeautifulSoup as bs    

res = requests.get('https://www.flipkart.com/samsung-mobile-store?otracker=nmenu_sub_Electronics_0_Samsung')
soup = bs(res.content, 'lxml')

names = [item['title'] for item in soup.select('._2cLu-1 a')]

links = [item['href'] for item in soup.select('._2cLu-l a')]

ratings = [item.text for item in soup.select('.hGSR34 div')]

print(names)
print(links)
print(ratings)

要为名称、链接和评级创建seaprate列表,请创建其列表并相应追加:

from bs4 import BeautifulSoup as bs

res = requests.get('https://www.flipkart.com/samsung-mobile-store?otracker=nmenu_sub_Electronics_0_Samsung')
soup = bs(res.content, 'html.parser')

namesList = []
linksList = []
ratingsList = []

namesLinks = soup.find_all('a', class_ ='Zhf2z-')    
ratings = soup.find_all('div', class_ ='hGSR34')

for rat in ratings:
    ratingsList.append(rat.text)

for nameLnk in namesLinks:
    namesList.append(nameLnk.get('title', 'No title available'))
    linksList.append(nameLnk.get('href', 'No href available'))

print(namesList)
print(linksList)
print(ratingsList)
输出

['Samsung Galaxy A30 (Black, 64 GB)', 'Samsung Galaxy M20 (Ocean Blue, 32 GB)', 'Samsung Galaxy M10 (Blue, 16 GB)', ... ]

['/samsung-galaxy-a30-black-64-gb/p/itmfec2hqbxcmbzn?pid=MOBFE4CSBDN9XETN&lid=L ...]

['4.4', '4.1', '4.1', '4.6', '4.3', '4.2', '4.3', '4.1', '4.2', '4.2', '4.2', '4.4', ... ]
Device: Samsung Galaxy A30 (Black, 64 GB) Link: /samsung-galaxy-a30-black-64-gb/pN&lid= .. .. cid=MOBFE4CSBDN9XETN Rating: 4.4
Device: Samsung Galaxy M20 (Ocean Blue, 32 GB) Link: /samsung-galaxy-m20-ocean-blue-32-gb/p/.. .. JGFRTYMC Rating: 4.1
Device: Samsung Galaxy M10 (Blue, 16 GB) Link: /samsung-galaxy-m10-blue-16-gb/p/.. .. 6JYE8YG Rating: 4.1
Device: Samsung Galaxy M30 (Gradation Black, 64 GB) Link:/samsung-galaxy-m30-gradation-black-64-gb/p/.. .. CDPXGUP Rating: 4.6
编辑

['Samsung Galaxy A30 (Black, 64 GB)', 'Samsung Galaxy M20 (Ocean Blue, 32 GB)', 'Samsung Galaxy M10 (Blue, 16 GB)', ... ]

['/samsung-galaxy-a30-black-64-gb/p/itmfec2hqbxcmbzn?pid=MOBFE4CSBDN9XETN&lid=L ...]

['4.4', '4.1', '4.1', '4.6', '4.3', '4.2', '4.3', '4.1', '4.2', '4.2', '4.2', '4.4', ... ]
Device: Samsung Galaxy A30 (Black, 64 GB) Link: /samsung-galaxy-a30-black-64-gb/pN&lid= .. .. cid=MOBFE4CSBDN9XETN Rating: 4.4
Device: Samsung Galaxy M20 (Ocean Blue, 32 GB) Link: /samsung-galaxy-m20-ocean-blue-32-gb/p/.. .. JGFRTYMC Rating: 4.1
Device: Samsung Galaxy M10 (Blue, 16 GB) Link: /samsung-galaxy-m10-blue-16-gb/p/.. .. 6JYE8YG Rating: 4.1
Device: Samsung Galaxy M30 (Gradation Black, 64 GB) Link:/samsung-galaxy-m30-gradation-black-64-gb/p/.. .. CDPXGUP Rating: 4.6
我还将研究一种将设备名称、链接和评级打印在一起的方法:

使用
zip()

输出

['Samsung Galaxy A30 (Black, 64 GB)', 'Samsung Galaxy M20 (Ocean Blue, 32 GB)', 'Samsung Galaxy M10 (Blue, 16 GB)', ... ]

['/samsung-galaxy-a30-black-64-gb/p/itmfec2hqbxcmbzn?pid=MOBFE4CSBDN9XETN&lid=L ...]

['4.4', '4.1', '4.1', '4.6', '4.3', '4.2', '4.3', '4.1', '4.2', '4.2', '4.2', '4.4', ... ]
Device: Samsung Galaxy A30 (Black, 64 GB) Link: /samsung-galaxy-a30-black-64-gb/pN&lid= .. .. cid=MOBFE4CSBDN9XETN Rating: 4.4
Device: Samsung Galaxy M20 (Ocean Blue, 32 GB) Link: /samsung-galaxy-m20-ocean-blue-32-gb/p/.. .. JGFRTYMC Rating: 4.1
Device: Samsung Galaxy M10 (Blue, 16 GB) Link: /samsung-galaxy-m10-blue-16-gb/p/.. .. 6JYE8YG Rating: 4.1
Device: Samsung Galaxy M30 (Gradation Black, 64 GB) Link:/samsung-galaxy-m30-gradation-black-64-gb/p/.. .. CDPXGUP Rating: 4.6

是的,您可以使用
选择
轻松完成此操作。请注意,1项没有评级。您不需要在两个不同的场合访问相同的元素来生成名称和链接

import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.flipkart.com/samsung-mobile-store?otracker=nmenu_sub_Electronics_0_Samsung'
r = requests.get(url)
soup = bs(r.content, 'lxml')

names, links = zip(*[(item['title'], 'https://www.flipkart.com' + item['href']) for item in soup.select('._2cLu-l')])
ratings = [item.text for item in soup.select('.niH0FQ  .hGSR34')]  # 1 rating missing for a product

print(list(names))
print(list(links))
print(ratings)

如果您想将它们加入到一个数据框中,并考虑缺失的评级,您可以使用以下方法(如果需要,您可以将If-else扩展到前两项)


请发布所需输出/名称列表应以列表格式显示每个设备的名称。链接列表应以列表格式显示每个设备的链接。评级列表应以列表格式显示每个设备的评级。这些是我们在编程中所称的“变量”。它们分别存储名称、链接和评级的项目列表。