Python 3.x Python BeautifulSoup正则表达式筛选器不工作_Python 3.x_Beautifulsoup_Findall

Python 3.x Python BeautifulSoup正则表达式筛选器不工作

python-3.x

Python 3.x Python BeautifulSoup正则表达式筛选器不工作,python-3.x,beautifulsoup,findall,Python 3.x,Beautifulsoup,Findall,我想要div类'hide info json'的内容，其父li标记类是'info wrap'或'info wrap no meta'，但不是'info wrap hide' HTML示例：我想要的内容-JSON数据我不想要的内容我想要的内容-JSON数据这是我的密码： soup=BeautifulSoup（res.text，“lxml”）对于soup.findAll（'li'，class=re.compile（'^（？。*hide.*info wrap.*$'）中的div

我想要div类'hide info json'的内容，其父li标记类是'info wrap'或'info wrap no meta'，但不是'info wrap hide'

HTML示例：


我想要的内容-JSON数据


我不想要的内容


我想要的内容-JSON数据

这是我的密码：

soup=BeautifulSoup（res.text，“lxml”）
对于soup.findAll（'li'，class=re.compile（'^（？。*hide.*info wrap.*$'）中的divTags：
对于divTags中的infoList.find_all（'div'，{'class'：'hide info json'）：
Curinfo=json.loads（infoList.text）

但它什么也不返回

如果我打开这个正则表达式，它就可以正常工作了。请帮我怎么做

对我来说，使用正则表达式不是强制性的，我想要的只是我想要的内容

谢谢您

重新导入
import re

html = """<li class="info-wrap">
    <div class="hide info-json">
        <p>Content That I Want - JSON Data </p>
    </div>
</li>

<li class="info-wrap hide">
    <div class="hide info-json">
        <p>Content That I Don't Want </p>
    </div>
</li>

<li class="info-wrap no-meta">
    <div class="hide info-json">
        <p>Content That I Want - JSON Data  </p>
    </div>
</li>"""

l = re.findall(r"""<li\s+class="info-wrap(\s+no-meta)?"\s*>\s*
               <div\s+class="hide\s+info-json"\s*>
               \s*(.*?)\s*
               </div>\s*
               </li>
               """,html, flags=re.VERBOSE|re.IGNORECASE|re.DOTALL)
l = [item[1] for item in l]
print(l)

html=“”
我想要的内容-JSON数据


我不想要的内容


我想要的内容-JSON数据
“”“
l=re.findall（r“”）\s*
\s*（.*）\s*
\*

“”，html，flags=re.VERBOSE | re.IGNORECASE | re.DOTALL）
l=[l中项目的项目[1]
印刷品（l）

印刷品：

['<p>Content That I Want - JSON Data </p>', '<p>Content That I Want - JSON Data  </p>']

['我想要的内容-JSON数据，'我想要的内容-JSON数据']

使用：not（bs4.7.1+）过滤掉不需要的类

import requests
from bs4 import BeautifulSoup as bs

html = '''<li class="info-wrap">
    <div class="hide info-json">
        <p>Content That I Want - JSON Data </p>
    </div>
</li>

<li class="info-wrap hide">
    <div class="hide info-json">
        <p>Content That I Don't Want </p>
    </div>
</li>

<li class="info-wrap no-meta">
    <div class="hide info-json">
        <p>Content That I Want - JSON Data  </p>
    </div>
</li>'''

soup = bs(html, 'lxml')
print([p.text for p in soup.select('.info-wrap:not(.hide) p')])

导入请求
从bs4导入BeautifulSoup作为bs
html=''
我想要的内容-JSON数据


我不想要的内容


我想要的内容-JSON数据
''
soup=bs（html，“lxml”）
打印（[p.text表示汤中的p.select（'.info wrap:not（.hide）p'））

如果我对soup.findAll中的divTags使用

（lambda tag:tag.name='li'和tag.get（'class'）=['info-wrap']）：

它会对soup中的divTags忽略带有class

info-wrap no-meta的li标记。findAll（'li'

…仔细阅读，然后再次查看HTML。它不会返回匹配项。请参阅。