Python 在beautifulsoup4中提取具有子元素的标记内的文本节点_Python_Web Scraping_Beautifulsoup

Python 在beautifulsoup4中提取具有子元素的标记内的文本节点

python web-scraping

Python 在beautifulsoup4中提取具有子元素的标记内的文本节点,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在解析和抓取的HTML具有以下代码： <li> <span> 929</span> Serve Returned </li> 929发球返回如何仅提取的文本节点，在本例中使用Beautifulsoup“返回的服务” .string不起作用，因为有一个子元素，.text返回中的文本我为此使用了str.replace方法： >>> li = soup.find('li') # or however you need to

我正在解析和抓取的HTML具有以下代码：

<li> <span> 929</span> Serve Returned </li>

929发球返回

如何仅提取

的文本节点，在本例中使用

Beautifulsoup

“返回的服务”

.string

不起作用，因为

有一个子元素，

.text

中的文本

我为此使用了

str.replace

方法：

>>> li = soup.find('li') # or however you need to drill down to the <li> tag 
>>> mytext = li.text.replace(li.find('span').text, "") 
>>> print mytext
Serve Returned

>>li=soup.find（'li'）#或您需要深入到标记的方式
>>>mytext=li.text.replace（li.find（'span'）.text，“”）
>>>打印我的文本
发球回击

第一个元素是跨度前的“文本”。此方法可以帮助您在任何子元素之前和之后（以及之间）查找文本

import bs4
html = r"<li> <span> 929</span> Serve Returned </li>"
soup = bs4.BeautifulSoup(html)
print soup.li.findAll(text=True, recursive=False)

[u' ', u' Serve Returned ']