Python 无法提取数字&；与html分开的文本_Python_Python 3.x

Python 无法提取数字&；与html分开的文本

python python-3.x

Python 无法提取数字&；与html分开的文本,python,python-3.x,Python,Python 3.x,从下面的html代码中，我想分别得到数字和文本，我能够得到数字，但是对于文本，它给出了如下所示的错误。（注意：它位于for loop，对于少数链接，它作为拆分（b'）工作。[1]是匹配的，如果没有找到索引，它的给出错误）错误： Traceback (most recent call last): File "C:/Users/Computers Zone/Google Drive/Python/SANDWICHTRY.py", line 49, in <module>

从下面的html代码中，我想分别得到数字和文本，我能够得到数字，但是对于文本，它给出了如下所示的错误。（注意：它位于

for loop

，对于少数链接，它作为

拆分（b'）工作。[1]

是匹配的，如果没有找到索引，它的给出错误）

错误：

Traceback (most recent call last):
  File "C:/Users/Computers Zone/Google Drive/Python/SANDWICHTRY.py", line 49, in <module>
    sandwich=soup.find('h1',{'class':'headline'}).encode_contents().strip().split(b'.')[1].decode("utf-8")
IndexError: list index out of range

如果标题字符串中没有

，即第二个元素不存在，则会发生此错误

要解决此问题，请获取结果，拆分字符串，但不要假设始终有两个元素：

from bs4 import BeautifulSoup

pages = '<h1 class="headline">1. Old Oak Tap BLT</h1>'

soup = BeautifulSoup(pages, 'lxml')
titles = soup.find('h1', {'class': 'headline'}).encode_contents().split(b'.')

for text in titles:  # go through all existing list elements
    print(text.decode("utf-8").strip())

如果它只拆分为一个项目（即字符串中没有

b.）

，则不要尝试访问第二个元素。使用

常规表达式我很高兴，这解决了您的问题。如果答案正确，请您将其标记为已接受。当然可以，先生，有没有办法将数字和字母分别存储在字符串中？因为包含正则表达式会使它更复杂。非常感谢您的时间：）没有正则表达式，包括我的答案，如果您使用第二个选项（if len（titles）==2
），您将在rank
和sandwich
中获得两个字符串。是的。。。但是如果连（头衔）=2然后它将跳过数字和字母。然后必须包含elif循环？是的，您可以使用else
为rank
和sandwich分配默认值。
soup=BeautifulSoup(pages,'lxml').find('div',{'id':'page'})
rank=soup.find('h1',{'class':'headline'}).encode_contents().strip().split(b'.')[0].decode("utf-8")
print (rank)
sandwich=soup.find('h1',{'class':'headline'}).encode_contents().strip().split(b'.')[1].decode("utf-8")
print(sandwich)

from bs4 import BeautifulSoup

pages = '<h1 class="headline">1. Old Oak Tap BLT</h1>'

soup = BeautifulSoup(pages, 'lxml')
titles = soup.find('h1', {'class': 'headline'}).encode_contents().split(b'.')

for text in titles:  # go through all existing list elements
    print(text.decode("utf-8").strip())

if len(titles) == 2:
    rank = titles[0].decode("utf-8").strip()
    sandwich = titles[1].decode("utf-8").strip()