如何在python中获得嵌套href？球门_Python_Html_Href

如何在python中获得嵌套href？球门

python html

如何在python中获得嵌套href？球门,python,html,href,Python,Html,Href,我需要重复搜索数百次：一,。在中搜索，例如WP_000177210.1 i、 e 二,。选择表中第二列CDS区域中的第一条记录 i、 e.NC_011415.1 1997353-1998831- 三,。在此序列名称下选择FASTA 四,。获取fasta序列 i、 e.>NC_011415.1:c1998831-1997353大肠杆菌SE11，完整序列 ATGACTTTATGGATAGTAACGGTGATGGATAACGGGGCAGGGCGCATCGCGTGTGATAGCGTATCGGTAACG

我需要重复搜索数百次：

一,。在中搜索，例如WP_000177210.1

i、 e

二,。选择表中第二列CDS区域中的第一条记录

i、 e.NC_011415.1 1997353-1998831-

三,。在此序列名称下选择FASTA

四,。获取fasta序列

i、 e.>NC_011415.1:c1998831-1997353大肠杆菌SE11，完整序列 ATGACTTTATGGATAGTAACGGTGATGGATAACGGGGCAGGGCGCATCGCGTGTGATAGCGTATCGGTAACGGTGTGATAGCGTATCGGTATCGGTAT CGGGCGAG

密码一,。在中搜索，例如WP_000177210.1

二,。在本例中，选择表第二列CDS区域核苷酸中的第一条记录NC_011415.1 1997353-1998831-即

我现在被困在这一步

附言

这是我第一次处理html格式。这也是我第一次在这里提问。我可能无法很好地表达这个问题。如果有什么问题，请告诉我。

不使用NCBI的REST API

导入时间从bs4导入BeautifulSoup 从selenium导入webdriver 打开firefox webbrowser以进行报废 browser=webdriver.Firefoxexecutable\u path=r'your\path\geckodriver.exe'将您自己的路径放在这里允许您完全加载包含所有JS的页面浏览器，获取'https://www.ncbi.nlm.nih.gov/ipg/?term=WP_000177210.1' 延迟将页面转换为汤以收集新获取的数据时间3 制作汤 soup=BeautifulSoupbrowser.page\u源代码，html 获取所有链接，方法是筛选出只有'/numcore'的链接并保留包含'/numcore'的链接 links=[a['href']表示汤中的a。如果a['href']中的'/numcore'，而不是a['href']='/numcore'，则查找'u all'a'，href=True] 注: 你需要这个包裹

您需要安装

没有使用NCBI的RESTAPI

您需要安装

import requests
from bs4 import BeautifulSoup

url = "https://www.ncbi.nlm.nih.gov/ipg/"
r = requests.get(url, params = "WP_000177210.1")
if r.status_code == requests.codes.ok:
    soup = BeautifulSoup(r.text,"lxml")

# try 1 (wrong)
## I tried this first, but it seemed like it only accessed to the first level of the href?!
for a in soup.find_all('a', href=True):
    if (a['href'][:8]) =="/nuccore":
        print("Found the URL:", a['href'])

# try 2 (not sure how to access nested href)
## According to the label I saw in the Develop Tools, I think I need to get the href in the following nested structure. However, it didn't work.
soup.select("html div #maincontent div div div #ph-ipg div table tbody tr td a")