Python 3.x 如何使用BeautifulSoup获取选项值?

Python 3.x 如何使用BeautifulSoup获取选项值?,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,我正试图从这个网站上删除所有的选项值,html是这样的 <select id="raff_size" name="size" class="required-entry"> <option value="">Storlek</option>

我正试图从这个网站上删除所有的选项值,html是这样的

<select id="raff_size" name="size" class="required-entry">
                                    <option value="">Storlek</option>
                                                    <option value="e8f7f9e31e2adb7ca18db3845b95a666">US 6.5</option>
                                    <option value="450b7ef575236df96a42816c448a03e0">US 7</option>
                                    <option value="c4d38ec34a0203c9799730fa75760162">US 7.5</option>
                                    <option value="3678a4bc494138c62c529f15b4103e45">US 8</option>
                                    <option value="655e5c520a63a7fd1592ea088b051e69">US 8.5</option>
                                    <option value="cb3f80e1babc079d802ca92d0760b2bd">US 9</option>
                                    <option value="ffc6670037f7ba5356247cea0537957d">US 9.5</option>
                                    <option value="7d6cde0d2cb6262febe73a0f5fef924a">US 10</option>
                                    <option value="6891a4d31dc5516e3b9fb7177bca001d">US 10.5</option>
                                    <option value="458aa765ade8646f71fb11721788454c">US 11</option>
                                    <option value="ced2d3d4b613b8a9f9bd8118bff92afe">US 11.5</option>
                            </select>
size = soup.find('select', {'id': 'raff_size'})
输出:

<option value="">Storlek</option>
<option value="0bd9387c405ac2640932eadd797b1e04">US 6</option>
<option value="33edd81062f8c10efe67d0171150c35a">US 6.5</option>
<option value="6e4561c2ea5721da305d1f45c6d57bd4">US 7</option>
<option value="c286e5f2d8615f2915a286c005f62209">US 7.5</option>
<option value="dfeea836f795400a522d7a2f3ba8892f">US 8</option>
<option value="3b2d898de8ac62d22f40c4533cd45660">US 8.5</option>
<option value="df5f2bc78fc796b8063c1c01b061f177">US 9</option>
<option value="c00c36ac0986eebb6cf4379edc62bff7">US 9.5</option>
<option value="75621b2740d1fab3215c56615630d9ea">US 10</option>
<option value="5a0a97169ada6e204cbbf4477b3b1817">US 10.5</option>
<option value="a11401ac458c781223a96be8ed95ee28">US 11</option>
<option value="5549ce89be4e08c57592273856950f74">US 11.5</option>
<option value="3fdf35ec96de226bcb5f5c80ff99e28b">US 12</option>
</select>

Process finished with exit code 0
e8f7f9e31e2adb7ca18db3845b95a666
450b7ef575236df96a42816c448a03e0
c4d38ec34a0203c9799730fa75760162
3678a4bc494138c62c529f15b4103e45
655e5c520a63a7fd1592ea088b051e69
cb3f80e1babc079d802ca92d0760b2bd
ffc6670037f7ba5356247cea0537957d
7d6cde0d2cb6262febe73a0f5fef924a
6891a4d31dc5516e3b9fb7177bca001d
458aa765ade8646f71fb11721788454c
ced2d3d4b613b8a9f9bd8118bff92afe
我怎样才能得到唯一的价值?我试过了

size = soup.find('select', {'id': 'raff_size'})
但它返回了一个错误。如果有人能帮忙,我们将不胜感激

试试这个:

from bs4 import BeautifulSoup
from tabulate import tabulate

sample_html = """
<select id="raff_size" name="size" class="required-entry">
                                    <option value="">Storlek</option>
                                                    <option value="e8f7f9e31e2adb7ca18db3845b95a666">US 6.5</option>
                                    <option value="450b7ef575236df96a42816c448a03e0">US 7</option>
                                    <option value="c4d38ec34a0203c9799730fa75760162">US 7.5</option>
                                    <option value="3678a4bc494138c62c529f15b4103e45">US 8</option>
                                    <option value="655e5c520a63a7fd1592ea088b051e69">US 8.5</option>
                                    <option value="cb3f80e1babc079d802ca92d0760b2bd">US 9</option>
                                    <option value="ffc6670037f7ba5356247cea0537957d">US 9.5</option>
                                    <option value="7d6cde0d2cb6262febe73a0f5fef924a">US 10</option>
                                    <option value="6891a4d31dc5516e3b9fb7177bca001d">US 10.5</option>
                                    <option value="458aa765ade8646f71fb11721788454c">US 11</option>
                                    <option value="ced2d3d4b613b8a9f9bd8118bff92afe">US 11.5</option>
                            </select>"""

soup = BeautifulSoup(sample_html, "html.parser").find_all("option")
values = [o["value"] for o in soup if o["value"]]

for value in values:
    print(value)
试试这个:

from bs4 import BeautifulSoup
from tabulate import tabulate

sample_html = """
<select id="raff_size" name="size" class="required-entry">
                                    <option value="">Storlek</option>
                                                    <option value="e8f7f9e31e2adb7ca18db3845b95a666">US 6.5</option>
                                    <option value="450b7ef575236df96a42816c448a03e0">US 7</option>
                                    <option value="c4d38ec34a0203c9799730fa75760162">US 7.5</option>
                                    <option value="3678a4bc494138c62c529f15b4103e45">US 8</option>
                                    <option value="655e5c520a63a7fd1592ea088b051e69">US 8.5</option>
                                    <option value="cb3f80e1babc079d802ca92d0760b2bd">US 9</option>
                                    <option value="ffc6670037f7ba5356247cea0537957d">US 9.5</option>
                                    <option value="7d6cde0d2cb6262febe73a0f5fef924a">US 10</option>
                                    <option value="6891a4d31dc5516e3b9fb7177bca001d">US 10.5</option>
                                    <option value="458aa765ade8646f71fb11721788454c">US 11</option>
                                    <option value="ced2d3d4b613b8a9f9bd8118bff92afe">US 11.5</option>
                            </select>"""

soup = BeautifulSoup(sample_html, "html.parser").find_all("option")
values = [o["value"] for o in soup if o["value"]]

for value in values:
    print(value)

我没有找到从
选项value
标记中获取值的简单方法,这就是为什么我决定将每一行视为一个字符串,并使用
re.find_all()
查找引号之间的值:

这将产生:

['',
 'e8f7f9e31e2adb7ca18db3845b95a666',
 '450b7ef575236df96a42816c448a03e0',
 'c4d38ec34a0203c9799730fa75760162',
 '3678a4bc494138c62c529f15b4103e45',
 '655e5c520a63a7fd1592ea088b051e69',
 'cb3f80e1babc079d802ca92d0760b2bd',
 'ffc6670037f7ba5356247cea0537957d',
 '7d6cde0d2cb6262febe73a0f5fef924a',
 '6891a4d31dc5516e3b9fb7177bca001d',
 '458aa765ade8646f71fb11721788454c',
 'ced2d3d4b613b8a9f9bd8118bff92afe']

我没有找到从
选项value
标记中获取值的简单方法,这就是为什么我决定将每一行视为一个字符串,并使用
re.find_all()
查找引号之间的值:

这将产生:

['',
 'e8f7f9e31e2adb7ca18db3845b95a666',
 '450b7ef575236df96a42816c448a03e0',
 'c4d38ec34a0203c9799730fa75760162',
 '3678a4bc494138c62c529f15b4103e45',
 '655e5c520a63a7fd1592ea088b051e69',
 'cb3f80e1babc079d802ca92d0760b2bd',
 'ffc6670037f7ba5356247cea0537957d',
 '7d6cde0d2cb6262febe73a0f5fef924a',
 '6891a4d31dc5516e3b9fb7177bca001d',
 '458aa765ade8646f71fb11721788454c',
 'ced2d3d4b613b8a9f9bd8118bff92afe']

我完全错过了引用标签,好的解决方案+1size=soup.find_all(“option”)values=[o[“value”]表示汤中的o,如果o[“value”]]表示值中的值:print(value)返回:TypeError:字符串索引必须是整数你是在示例HTML还是在实际HTML上得到的?我是在实际HTML上得到的,我完全没有引用标记,好办法+1size=soup.find_all(“option”)value=[o[“value”]表示soup中的o,如果o[“value”]]表示值中的值:print(value)返回:TypeError:string索引必须是整数你是从示例HTML还是从实际HTML中得到的?我是从实际HTML中得到的