Python 需要帮助检索第一次出现的有漂亮汤和蟒蛇的东西吗
我正在尝试搜索SEC网站,以查找首次出现的“10-Q”或“10-K”,并检索网站上“交互式数据按钮”下的链接 我试图从中检索链接的url是: 结果链接应为: 我当前使用的代码:Python 需要帮助检索第一次出现的有漂亮汤和蟒蛇的东西吗,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我正在尝试搜索SEC网站,以查找首次出现的“10-Q”或“10-K”,并检索网站上“交互式数据按钮”下的链接 我试图从中检索链接的url是: 结果链接应为: 我当前使用的代码: import requests from bs4 import BeautifulSoup date1 = "20200506" ticker = "AAPL" URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=g
import requests
from bs4 import BeautifulSoup
date1 = "20200506"
ticker = "AAPL"
URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' +
date1 + '&owner=exclude&count=40'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='seriesDiv')
rows = results.find_all('tr')
for row in rows:
document = row.find('td', string='10-Q')
link = row.find('a', id="interactiveDataBtn")
if None in (document, link):
continue
print(document.text)
print(link['href'])
此代码返回10-Q的所有链接,但它应该同时用于10-Q和10-K
有人能帮我塑造这个代码,让它只返回10-Q或10-K的第一次出现的链接吗
谢谢最快的解决方案是在
.find()
方法中使用lambda
例如:
import requests
from bs4 import BeautifulSoup
date1 = "20200506"
ticker = "AAPL"
URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' + date1 + '&owner=exclude&count=40'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='seriesDiv')
rows = results.find_all('tr')
for row in rows:
document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
link = row.find('a', id="interactiveDataBtn")
if None in (document, link):
continue
print(document.text)
print('https://www.sec.gov' + link['href'])
打印10-Q
和10-K
链接:
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000010&xbrl_type=v
10-K
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000076&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000066&xbrl_type=v
编辑:若要仅获取第一个匹配项,可以使用字典。每次迭代都会检查字典中是否有键(字符串
10-Q
或10-K
),如果没有,则添加它:
links = dict()
for row in rows:
document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
link = row.find('a', id="interactiveDataBtn")
if None in (document, link):
continue
if document.text not in links:
links[document.text] = 'https://www.sec.gov' + link['href']
print(links)
印刷品:
{'10-Q': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v',
'10-K': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v'}
谢谢你的回复。这确实可以过滤10-Q和K,太棒了,谢谢。现在,你能给我解释一下,我怎样才能找到第一次出现的链接吗?可能是在列表中进行解析,但我无法理解it@Jackey12345我更新了我的答案-你可以使用字典跟踪你发现的第一个10-Q/10-K出现的信息。非常感谢你这太棒了,它就像一个符咒!