使用Python提取站点的子分区文本
我被困在提取li标记之间的文本中。下面是html页面源代码的一部分使用Python提取站点的子分区文本,python,html,tags,beautifulsoup,web-crawler,Python,Html,Tags,Beautifulsoup,Web Crawler,我被困在提取li标记之间的文本中。下面是html页面源代码的一部分 <div class="item_desc_text"> <ul class="fk-key-features"> <li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>
<div class="item_desc_text">
<ul class="fk-key-features">
<li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>
<li>Wi-Fi Enabled</li><li>8 GB Internal Memory</li><li>Dual SIM (GSM + GSM)</li><li>HD Recording</li>
<li>5 MP Primary Camera and 1.3 MP Secondary Camera</li><li>4.5-inch HD Display</li>
</ul>
</div>
我该怎么办?这里有一种方法:
>>> from bs4 import BeautifulSoup as bs
>>> data = '''
... <div class="item_desc_text">
... <ul class="fk-key-features">
... <li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>
... <li>Wi-Fi Enabled</li><li>8 GB Internal Memory</li><li>Dual SIM (GSM + GSM)</li><li>HD Recording</li>
... <li>5 MP Primary Camera and 1.3 MP Secondary Camera</li><li>4.5-inch HD Display</li>
... </ul>
... </div>
... '''
>>> soup = bs(data)
>>> ul = soup.find('ul', attrs={'class':'fk-key-features'})
>>> for item in ul.find_all('li'):
... print item.get_text().strip()
...
1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM
Android v4.4 (KitKat) OS
Wi-Fi Enabled
8 GB Internal Memory
Dual SIM (GSM + GSM)
HD Recording
5 MP Primary Camera and 1.3 MP Secondary Camera
4.5-inch HD Display
>>从bs4导入BeautifulSoup作为bs
>>>数据=“”
...
...
... - 1.2 GHz高通Snapdragon 400四核处理器和1 GB RAM
- 安卓v4.4(KitKat)操作系统
... <支持Wi-Fi的8 GB内存双卡(GSM+GSM)- 高清录制
... - 5 MP主摄像头和1.3 MP辅助摄像头
- 4.5英寸高清显示屏
...
...
... '''
>>>汤=bs(数据)
>>>ul=soup.find('ul',attrs={'class':'fk-key-features'})
>>>对于ul.find_all('li')中的项目:
... 打印项目。获取文本().strip()
...
1.2 GHz高通Snapdragon 400四核处理器和1 GB RAM
Android v4.4(KitKat)操作系统
启用Wi-Fi
8GB内存
双卡(GSM+GSM)
高清录制
5 MP主摄像头和1.3 MP辅助摄像头
4.5英寸高清显示器
u提供的解决方案是k,但当整个页面加载时,它不会在li标记之间返回正确的文本,因为站点中有许多ul标记。我已经更新了问题,请检查一下again@user3455672您必须选择所需的ul
。更新了答案,为ul中的项目选择带有类fk关键功能的ul
。查找所有('li'):AttributeError:'NoneType'对象没有属性'find_all'
>>> from bs4 import BeautifulSoup as bs
>>> data = '''
... <div class="item_desc_text">
... <ul class="fk-key-features">
... <li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>
... <li>Wi-Fi Enabled</li><li>8 GB Internal Memory</li><li>Dual SIM (GSM + GSM)</li><li>HD Recording</li>
... <li>5 MP Primary Camera and 1.3 MP Secondary Camera</li><li>4.5-inch HD Display</li>
... </ul>
... </div>
... '''
>>> soup = bs(data)
>>> ul = soup.find('ul', attrs={'class':'fk-key-features'})
>>> for item in ul.find_all('li'):
... print item.get_text().strip()
...
1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM
Android v4.4 (KitKat) OS
Wi-Fi Enabled
8 GB Internal Memory
Dual SIM (GSM + GSM)
HD Recording
5 MP Primary Camera and 1.3 MP Secondary Camera
4.5-inch HD Display