Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在2个ul标签之间刮取数据_Python_Html_Web Scraping_Html Parsing - Fatal编程技术网

Python 在2个ul标签之间刮取数据

Python 在2个ul标签之间刮取数据,python,html,web-scraping,html-parsing,Python,Html,Web Scraping,Html Parsing,嗨,我正试着在标签之间刮擦。下面我附上一部分的来源,我想刮。如果你仔细看,有3个ul标签。第一个ul标签具有class=“listGroup”。我试图提取第二个“ul”标记的文本,其思想是后面跟着另一个具有类“listGroup”的“ul”标记。请分享我如何做到这一点 <ul class="listGroup" id="ul_e6d09fbd-19fe-49ac-9b47-bd857c0d411b"><li class="acc

嗨,我正试着在标签之间刮擦。下面我附上一部分的来源,我想刮。如果你仔细看,有3个ul标签。第一个ul标签具有class=“listGroup”。我试图提取第二个“ul”标记的文本,其思想是后面跟着另一个具有类“listGroup”的“ul”标记。请分享我如何做到这一点

<ul class="listGroup" id="ul_e6d09fbd-19fe-49ac-9b47-bd857c0d411b"><li class="acces-listitems"><a href="https://order.store.mayoclinic.com/books/gnweb43?utm_source=MC-DotOrg-PS&amp;utm_medium=Link&amp;utm_campaign=FamilyHealth-Book&amp;utm_content=FHB">Book: Mayo Clinic Family Health Book, 5th Edition</a></li><li class="acces-hide-listitems"><a href="https://order.store.mayoclinic.com/hl/hldiged?utm_source=MC-DotOrg-PS&amp;utm_medium=Link&amp;utm_campaign=HealthLetter-Digital&amp;utm_content=HLDE">Newsletter: Mayo Clinic Health Letter — Digital Edition</a></li></ul>
<ul>
<li>Osteoporosis</li>
<li>Kidney stones</li>
<li>Excessive urination</li>
<li>Abdominal pain</li>
<li>Tiring easily or weakness</li>
<li>Depression or forgetfulness</li>
<li>Bone and joint pain</li>
<li>Frequent complaints of illness with no apparent cause</li>
<li>Nausea, vomiting or loss of appetite</li>
</ul>
<ul>
<li>A noncancerous growth (adenoma) on a gland is the most common cause.</li>
<li>Enlargement (hyperplasia) of two or more parathyroid glands accounts for most other cases.</li>
<li>A cancerous tumor is a very rare cause of primary hyperparathyroidism.</li>
</ul>

您可以使用CSS选择器
ul.listGroup+ul li
->这将选择
标签旁边
标签的所有
  • 标签和类
    “listGroup”


    这似乎是CSS选择器的自然用例,即:

    ul.listGroup+ul-li
    将选择第一个
    ul
    标记中的所有
    li
    标记,该标记位于类
    listGroup
    的每个
    ul
    标记之后。将
    +
    替换为
    ~
    将选择所有
    li
    标记(在本例中为2)
    ul
    标记,每个标记后面都有类
    列表组

    要在脚本中实现此答案,请将
    查找所有
    替换为
    选择
    ,并使用相关CSS选择器更新选择器

    导入请求
    进口大熊猫
    从bs4导入BeautifulSoup
    有关['/疾病状况/甲状旁腺功能亢进/症状原因/syc-20356194']中的链接:
    页面=请求。获取(f)https://www.mayoclinic.org{link}”)
    soup=BeautifulSoup(page.content,“html.parser”)
    对于汤中的每一个。选择(“ul.listGroup+ul li”):
    打印(每个.text)
    
    < /代码> 也许你应该考虑使用正则表达式来捕获。

    你说你正在寻找“第二个”UL标签的文本,它的想法是它后面跟着另一个“UL”标签,它有一个类“ListGROUP”;但是在您的示例中,第三个
    标记没有类。
    import requests
    import pandas
    from bs4 import BeautifulSoup
    for link in ['/diseases-conditions/hyperparathyroidism/symptoms-causes/syc-20356194']:
        page = requests.get(f"https://www.mayoclinic.org{link}")
        soup = BeautifulSoup(page.content, "html.parser")
        for each in soup.find_all("ul"):
            print(each)
    
    txt = '''<ul class="listGroup" id="ul_e6d09fbd-19fe-49ac-9b47-bd857c0d411b"><li class="acces-listitems"><a href="https://order.store.mayoclinic.com/books/gnweb43?utm_source=MC-DotOrg-PS&amp;utm_medium=Link&amp;utm_campaign=FamilyHealth-Book&amp;utm_content=FHB">Book: Mayo Clinic Family Health Book, 5th Edition</a></li><li class="acces-hide-listitems"><a href="https://order.store.mayoclinic.com/hl/hldiged?utm_source=MC-DotOrg-PS&amp;utm_medium=Link&amp;utm_campaign=HealthLetter-Digital&amp;utm_content=HLDE">Newsletter: Mayo Clinic Health Letter — Digital Edition</a></li></ul>
    
    <ul>
    <li>Osteoporosis</li>
    <li>Kidney stones</li>
    <li>Excessive urination</li>
    <li>Abdominal pain</li>
    <li>Tiring easily or weakness</li>
    <li>Depression or forgetfulness</li>
    <li>Bone and joint pain</li>
    <li>Frequent complaints of illness with no apparent cause</li>
    <li>Nausea, vomiting or loss of appetite</li>
    </ul>
    <ul>
    <li>A noncancerous growth (adenoma) on a gland is the most common cause.</li>
    <li>Enlargement (hyperplasia) of two or more parathyroid glands accounts for most other cases.</li>
    <li>A cancerous tumor is a very rare cause of primary hyperparathyroidism.</li>
    </ul>'''
    
    soup = BeautifulSoup(txt, 'html.parser')
    
    for li in soup.select('ul.listGroup + ul li'):
        print(li.text)
    
    Osteoporosis
    Kidney stones
    Excessive urination
    Abdominal pain
    Tiring easily or weakness
    Depression or forgetfulness
    Bone and joint pain
    Frequent complaints of illness with no apparent cause
    Nausea, vomiting or loss of appetite