Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python提取站点的子分区文本_Python_Html_Tags_Beautifulsoup_Web Crawler - Fatal编程技术网

使用Python提取站点的子分区文本

使用Python提取站点的子分区文本,python,html,tags,beautifulsoup,web-crawler,Python,Html,Tags,Beautifulsoup,Web Crawler,我被困在提取li标记之间的文本中。下面是html页面源代码的一部分 <div class="item_desc_text"> <ul class="fk-key-features"> <li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>

我被困在提取li标记之间的文本中。下面是html页面源代码的一部分

<div class="item_desc_text">
    <ul class="fk-key-features">
      <li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>
      <li>Wi-Fi Enabled</li><li>8 GB Internal Memory</li><li>Dual SIM (GSM + GSM)</li><li>HD Recording</li>
      <li>5 MP Primary Camera and 1.3 MP Secondary Camera</li><li>4.5-inch HD Display</li>
    </ul>
</div>

我该怎么办?

这里有一种方法:

>>> from bs4 import BeautifulSoup as bs
>>> data = '''
... <div class="item_desc_text">
...     <ul class="fk-key-features">
...       <li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>
...       <li>Wi-Fi Enabled</li><li>8 GB Internal Memory</li><li>Dual SIM (GSM + GSM)</li><li>HD Recording</li>
...       <li>5 MP Primary Camera and 1.3 MP Secondary Camera</li><li>4.5-inch HD Display</li>
...     </ul>
... </div>
... '''
>>> soup = bs(data)
>>> ul = soup.find('ul', attrs={'class':'fk-key-features'})
>>> for item in ul.find_all('li'):
...     print item.get_text().strip()
...
1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM
Android v4.4 (KitKat) OS
Wi-Fi Enabled
8 GB Internal Memory
Dual SIM (GSM + GSM)
HD Recording
5 MP Primary Camera and 1.3 MP Secondary Camera
4.5-inch HD Display
>>从bs4导入BeautifulSoup作为bs
>>>数据=“”
... 
...     
    ...
  • 1.2 GHz高通Snapdragon 400四核处理器和1 GB RAM
  • 安卓v4.4(KitKat)操作系统
  • ... <支持Wi-Fi的8 GB内存双卡(GSM+GSM)
  • 高清录制
  • ...
  • 5 MP主摄像头和1.3 MP辅助摄像头
  • 4.5英寸高清显示屏
  • ...
... ... ''' >>>汤=bs(数据) >>>ul=soup.find('ul',attrs={'class':'fk-key-features'}) >>>对于ul.find_all('li')中的项目: ... 打印项目。获取文本().strip() ... 1.2 GHz高通Snapdragon 400四核处理器和1 GB RAM Android v4.4(KitKat)操作系统 启用Wi-Fi 8GB内存 双卡(GSM+GSM) 高清录制 5 MP主摄像头和1.3 MP辅助摄像头 4.5英寸高清显示器
u提供的解决方案是k,但当整个页面加载时,它不会在li标记之间返回正确的文本,因为站点中有许多ul标记。我已经更新了问题,请检查一下again@user3455672您必须选择所需的
ul
。更新了答案,为ul中的项目选择带有类
fk关键功能的
ul
。查找所有('li'):AttributeError:'NoneType'对象没有属性'find_all'
>>> from bs4 import BeautifulSoup as bs
>>> data = '''
... <div class="item_desc_text">
...     <ul class="fk-key-features">
...       <li>1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM</li><li>Android v4.4 (KitKat) OS</li>
...       <li>Wi-Fi Enabled</li><li>8 GB Internal Memory</li><li>Dual SIM (GSM + GSM)</li><li>HD Recording</li>
...       <li>5 MP Primary Camera and 1.3 MP Secondary Camera</li><li>4.5-inch HD Display</li>
...     </ul>
... </div>
... '''
>>> soup = bs(data)
>>> ul = soup.find('ul', attrs={'class':'fk-key-features'})
>>> for item in ul.find_all('li'):
...     print item.get_text().strip()
...
1.2 GHz Qualcomm Snapdragon 400 Quad Core Processor and 1 GB RAM
Android v4.4 (KitKat) OS
Wi-Fi Enabled
8 GB Internal Memory
Dual SIM (GSM + GSM)
HD Recording
5 MP Primary Camera and 1.3 MP Secondary Camera
4.5-inch HD Display