Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup-提取<;b>;将文本标记为数组_Python_Arrays_Python 3.x_Web Scraping_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup-提取<;b>;将文本标记为数组

Python BeautifulSoup-提取<;b>;将文本标记为数组,python,arrays,python-3.x,web-scraping,beautifulsoup,Python,Arrays,Python 3.x,Web Scraping,Beautifulsoup,我试图将一个特定类(其中有多个实例)中的b标记的文本提取到一个数组中。我是用beautifulsoup4和python3来实现这一点的 我正在尝试浏览网页。这就是我的代码目前的样子 def cattest(): subcat = soup.find_all('span', {"class": "zg_hrsr_ladder"})[x].findChildren() for i, child in enumerate(subcat): categories = ch

我试图将一个特定类(其中有多个实例)中的b标记的文本提取到一个数组中。我是用
beautifulsoup4
python3
来实现这一点的

我正在尝试浏览网页。这就是我的代码目前的样子

def cattest():
    subcat = soup.find_all('span', {"class": "zg_hrsr_ladder"})[x].findChildren()
    for i, child in enumerate(subcat):
        categories = child.text
        print(categories)

for x in range(0, len(cat)):
    cattest()
这将产生以下输出:

Beauty & Personal Care
Hair Care
Hair Care Products
Conditioners
Conditioners
Beauty & Personal Care
Personal Care
Personal Care
我想做的是从
zg_hrsr_ladder
元素的b标签中获取文本,并将它们放入一个数组中。预期结果将是:

[Conditioners, Personal Care]

关于如何实现这一点的任何帮助都将非常有用。

您可以使用列表理解,并将
'b'
添加到
findChildren的参数中

In [59]: [element.text for s in soup.find_all('span', {"class": "zg_hrsr_ladder"}) for element in s.findChildren('b')]
Out[59]: ['Conditioners', 'Personal Care']
这相当于

In [63]: res = []

In [64]: for s in soup.find_all('span', {"class": "zg_hrsr_ladder"}):
    ...:     for element in s.findChildren('b'):
    ...:         res.append(element.text)
    ...:

In [65]: res
Out[65]: ['Conditioners', 'Personal Care']

您可以使用列表理解并将
'b'
添加到
findChildren

In [59]: [element.text for s in soup.find_all('span', {"class": "zg_hrsr_ladder"}) for element in s.findChildren('b')]
Out[59]: ['Conditioners', 'Personal Care']
这相当于

In [63]: res = []

In [64]: for s in soup.find_all('span', {"class": "zg_hrsr_ladder"}):
    ...:     for element in s.findChildren('b'):
    ...:         res.append(element.text)
    ...:

In [65]: res
Out[65]: ['Conditioners', 'Personal Care']

你可以用很多方法做到这一点。这里有两个。从以下两个选项中选择一个:

from bs4 import BeautifulSoup
import requests

url = "https://www.amazon.ca/Abba-Moisture-Conditioner-Unisex-33-8-Ounce/dp/B000VZS3VW/ref=sr_1_1/145-7226897-1893421?ie=UTF8&qid=1532712550&sr=8-1&keywords=B000VZS3VW"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")

#using .find_next()
subcat = [item.find_next("b").text for item in soup.find_all('span', class_='zg_hrsr_ladder')]
print(subcat)

#using selector
subcat = [item.text for item in soup.select('span.zg_hrsr_ladder > b')]
print(subcat)
它们都产生相同的结果:

['Conditioners', 'Personal Care']

你可以用很多方法做到这一点。这里有两个。从以下两个选项中选择一个:

from bs4 import BeautifulSoup
import requests

url = "https://www.amazon.ca/Abba-Moisture-Conditioner-Unisex-33-8-Ounce/dp/B000VZS3VW/ref=sr_1_1/145-7226897-1893421?ie=UTF8&qid=1532712550&sr=8-1&keywords=B000VZS3VW"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")

#using .find_next()
subcat = [item.find_next("b").text for item in soup.find_all('span', class_='zg_hrsr_ladder')]
print(subcat)

#using selector
subcat = [item.text for item in soup.select('span.zg_hrsr_ladder > b')]
print(subcat)
它们都产生相同的结果:

['Conditioners', 'Personal Care']

非常感谢,这正是我想要的。非常感谢你的帮助,没问题。请接受答案,如果是这种情况,请向上投票。我不知道是谁投了反对票,但我的向上投票并没有出于某种原因将其重置为0。非常感谢,这正是我要找的。非常感谢你的帮助,没问题。请接受答案,如果是这种情况,请向上投票。我不知道谁投了反对票,但我的向上投票并没有因为某种原因将其重置为0。