Python 查找未返回值的下一个同级。在没有其他类的情况下，如何提取所需的两个类？_Python_Html_Beautifulsoup_Python Requests_Find

Python 查找未返回值的下一个同级。在没有其他类的情况下，如何提取所需的两个类？

python html

Python 查找未返回值的下一个同级。在没有其他类的情况下，如何提取所需的两个类？,python,html,beautifulsoup,python-requests,find,Python,Html,Beautifulsoup,Python Requests,Find,我想从下面的“内容”中提取物品重量和产品尺寸。我错过了什么？在我的脚本中，找不到我要查找的内容。是否有一种更简单的方法来提取物品重量和产品尺寸？谢谢 import bs4 as bs content = ''' <th class="a-color-secondary a-size-base prodDetSectionEntry"> Item Weight </th> <td class="a-size-base prodDetAt

我想从下面的“内容”中提取物品重量和产品尺寸。我错过了什么？在我的脚本中，找不到我要查找的内容。是否有一种更简单的方法来提取物品重量和产品尺寸？谢谢

import bs4 as bs

content = '''
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Item Weight
</th>
<td class="a-size-base prodDetAttrValue">
0.16 ounces
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Product Dimensions
</th>
<td class="a-size-base prodDetAttrValue">
4.8 x 3.4 x 0.5 inches
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Batteries Included?
</th>
<td class="a-size-base prodDetAttrValue">
No
</td>
</tr>
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Batteries Required?
</th>
<td class="a-size-base prodDetAttrValue">
No
</td>
</tr>
'''
soup = bs.BeautifulSoup(content, features='lxml')


try:
    product = {
        'weight': soup.find(text='Item Weight').parent.find_next_siblings(),
        'dimension': soup.find(text='Product Dimensions').parent.find_next_siblings()
    }
except:
    product = {
        'weight': 'item unavailable',
        'dimension': 'item unavailable'
    }
print(product)

您错误地使用了“查找下一个兄弟姐妹”。

td

标记是

th

标记的兄弟，而不是父

tr

标记的兄弟

从bs4导入美化组
进口稀土
内容=“”
物品重量
0.16盎司
产品尺寸
4.8 x 3.4 x 0.5英寸
包括电池吗？
不
'''
soup=BeautifulSoup（内容为“html.parser”）
d={
“权重”：soup.find（'th'，text=re.compile（'\s*项目权重\s*））。查找下一个兄弟姐妹（'td'）。text.strip（），
“维度”：soup.find（'th'，text=re.compile（'\s*产品维度\s*）。查找下一个同级（'td'）。text.strip（）
}
印刷品（d）

您错误地使用了“查找下一个兄弟姐妹”。

td

标记是

th

标记的兄弟，而不是父

tr

标记的兄弟

从bs4导入美化组
进口稀土
内容=“”
物品重量
0.16盎司
产品尺寸
4.8 x 3.4 x 0.5英寸
包括电池吗？
不
'''
soup=BeautifulSoup（内容为“html.parser”）
d={
“权重”：soup.find（'th'，text=re.compile（'\s*项目权重\s*））。查找下一个兄弟姐妹（'td'）。text.strip（），
“维度”：soup.find（'th'，text=re.compile（'\s*产品维度\s*）。查找下一个同级（'td'）。text.strip（）
}
印刷品（d）

首先，如果你想找到下一个兄弟姐妹，你需要使用

.find_next_sibling（）

而不是

。find_next_sibling（）

。那么，没有得到任何输出的原因是标记中的文本表示。如果您这样做：

print([each_th.text for each_th in soup.find_all('th')])

您将看到结果如下所示：

['\nItem Weight\n', '\nProduct Dimensions\n', '\nBatteries Included?\n', '\nBatteries Required?\n']

因此，您需要将

text='Item Weight'

更改为

text='nItem Weight\n'

，依此类推：

try:
    product = {
        'weight': soup.find(text='\nItem Weight\n').parent.find_next_sibling().text,
        'dimension': soup.find(text='\nProduct Dimensions\n').parent.find_next_sibling().text
    }
except:
    product = {
        'weight': 'item unavailable',
        'dimension': 'item unavailable'
    }

这将提供：

{'weight': '\n0.16 ounces\n', 'dimension': '\n4.8 x 3.4 x 0.5 inches\n'}

现在，如果您想删除这些换行符，您可以使用

.replace（'\n'，''）

或

.strip（）

在抓取它时执行此操作。

首先，如果您想查找下一个兄弟姐妹，您需要使用

。查找下一个兄弟姐妹（）

而不是

。查找下一个兄弟姐妹（）

。那么，没有得到任何输出的原因是标记中的文本表示。如果您这样做：

print([each_th.text for each_th in soup.find_all('th')])

您将看到结果如下所示：

['\nItem Weight\n', '\nProduct Dimensions\n', '\nBatteries Included?\n', '\nBatteries Required?\n']

因此，您需要将

text='Item Weight'

更改为

text='nItem Weight\n'

，依此类推：

try:
    product = {
        'weight': soup.find(text='\nItem Weight\n').parent.find_next_sibling().text,
        'dimension': soup.find(text='\nProduct Dimensions\n').parent.find_next_sibling().text
    }
except:
    product = {
        'weight': 'item unavailable',
        'dimension': 'item unavailable'
    }

这将提供：

{'weight': '\n0.16 ounces\n', 'dimension': '\n4.8 x 3.4 x 0.5 inches\n'}

现在，如果要删除这些换行符，可以在抓取时使用

.replace（'\n'，''）

或

.strip（）

执行此操作