Python美化在表数据中循环_Python_Web Scraping_Beautifulsoup

Python美化在表数据中循环

python web-scraping

Python美化在表数据中循环,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,这里是Python的新手。我正在尝试从这个页面捕获一些数据。我试图在两个列表中获取项目名称和项目类型。我可以在以后找出如何将它们合并到一个表中。任何帮助都会很好代码行本身可以工作，但循环对我不起作用。这将成功生成两行代码： import urllib import bs4 as bs sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read() soup = bs.BeautifulSoup(sa

这里是Python的新手。我正在尝试从这个页面捕获一些数据。我试图在两个列表中获取项目名称和项目类型。我可以在以后找出如何将它们合并到一个表中。任何帮助都会很好

代码行本身可以工作，但循环对我不起作用。这将成功生成两行代码：

import urllib
import bs4 as bs

sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

item_details =  soup.find('tbody')
print(item_details) 

item_name = item_details.find('div', class_='item-details').h3.a.text
print(item_name)

item_type = item_details.find('ul', class_='item-type').span.text
print(item_type)

这会反复重复第一个项目名称的值：

for div in soup.find_all('div', class_='item-details'):
    item_name = item_details.find('div', class_='item-details').h3.a.text
    print(item_name)
    item_type = item_details.find('ul', class_='item-type').span.text
    print(item_type)

这是输出：

Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
...

您需要使用

find_all

（返回列表）而不是

find

（返回单个元素）：

输出为：

Veil of Steel  -  Magic Helm
Leoric's Crown  -  Legendary Helm
Harlequin Crest  -  Magic Helm
The Undead Crown  -  Magic Helm
...

或以更可读的格式：

names = item_details.find_all('div', class_='item-details')
types = item_details.find_all('ul', class_='item-type')

for name, type in zip(names, types):
    print(name.h3.a.text, " - ", type.span.text)

您可以在详细信息部分的一个循环中执行此操作，而无需将它们保存在不同的列表中并进行匹配

item_details = []
for sections in soup.select('.item-details'):
    item_name = sections.select_one('h3[class*="subheader-"]').text.strip()  # partial match subheader-1, subheader-2, ....
    item_type = sections.select_one('ul[class="item-type"]').text.strip()
    item_details.append([item_name, item_type])

print(item_details)

输出

[《钢铁之幕》、《魔法头盔》]、[《莱奥里克王冠》、《传奇头盔》]、

这项工作：

sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

item_names = soup.find_all('div', class_='item-details')
for ele in item_names:
   print(ele.h3.a.text)

item_type = soup.find_all('ul', class_='item-type')
for ele in item_type:
    print(ele.span.text)

您的代码不起作用的原因：

看起来您的代码没有遍历所有元素，而是不断获取相同的元素（查找所有元素的所有元素）.

谢谢这个Tobey。同样正确，但另一个答案对我来说更直观。这太棒了！我如何在2xn矩阵中显示结果，而不是将它们串联起来？@Lucas，你希望输出是什么样子？你能分享一个例子吗？我最终会将其导出到csv文件中，这样我就想有两列，一列带有项目\name和下面的所有值，然后是带有项目类型的值，以及下面的值。@Lucas，你是说

name\u list=[name.h3.a.text代表名称中的名称]

和

type\u list=[type.span.text代表类型中的名称]

？这将返回两个单独的名称和类型列表。我想我真正想问的是，如何将其转换为可以导出到csv的格式。csv将包含两列（就像一个包含两列的excel文件）。我不希望打印名称，而是希望将其写入数据框（？）.嘿。当我开始写我的答案时，你的答案没有发布。我花了一些时间检查我的代码是否有效，我想你那时已经发布了你的答案。但是的，几乎是一样的。

sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

item_names = soup.find_all('div', class_='item-details')
for ele in item_names:
   print(ele.h3.a.text)

item_type = soup.find_all('ul', class_='item-type')
for ele in item_type:
    print(ele.span.text)