Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/305.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 需要帮助检索第一次出现的有漂亮汤和蟒蛇的东西吗_Python_Python 3.x_Beautifulsoup - Fatal编程技术网

Python 需要帮助检索第一次出现的有漂亮汤和蟒蛇的东西吗

Python 需要帮助检索第一次出现的有漂亮汤和蟒蛇的东西吗,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我正在尝试搜索SEC网站,以查找首次出现的“10-Q”或“10-K”,并检索网站上“交互式数据按钮”下的链接 我试图从中检索链接的url是: 结果链接应为: 我当前使用的代码: import requests from bs4 import BeautifulSoup date1 = "20200506" ticker = "AAPL" URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=g

我正在尝试搜索SEC网站,以查找首次出现的“10-Q”或“10-K”,并检索网站上“交互式数据按钮”下的链接

我试图从中检索链接的url是:

结果链接应为:

我当前使用的代码:

import requests
from bs4 import BeautifulSoup

date1 = "20200506"
ticker = "AAPL"

URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' + 
date1 + '&owner=exclude&count=40'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find(id='seriesDiv')

rows = results.find_all('tr')

for row in rows:
    document = row.find('td', string='10-Q')
    link = row.find('a', id="interactiveDataBtn")
    if None in (document, link):
        continue
    print(document.text)
    print(link['href'])
此代码返回10-Q的所有链接,但它应该同时用于10-Q和10-K

有人能帮我塑造这个代码,让它只返回10-Q或10-K的第一次出现的链接吗


谢谢

最快的解决方案是在
.find()
方法中使用lambda

例如:

import requests
from bs4 import BeautifulSoup

date1 = "20200506"
ticker = "AAPL"

URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' + date1 + '&owner=exclude&count=40'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find(id='seriesDiv')
rows = results.find_all('tr')

for row in rows:
    document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
    link = row.find('a', id="interactiveDataBtn")
    if None in (document, link):
        continue
    print(document.text)
    print('https://www.sec.gov' + link['href'])
打印
10-Q
10-K
链接:

10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000010&xbrl_type=v
10-K
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000076&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000066&xbrl_type=v

编辑:若要仅获取第一个匹配项,可以使用字典。每次迭代都会检查字典中是否有键(字符串
10-Q
10-K
),如果没有,则添加它:

links = dict()
for row in rows:
    document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
    link = row.find('a', id="interactiveDataBtn")
    if None in (document, link):
        continue
    if document.text not in links:
        links[document.text] = 'https://www.sec.gov' + link['href']

print(links)
印刷品:

{'10-Q': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v', 
 '10-K': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v'}

谢谢你的回复。这确实可以过滤10-Q和K,太棒了,谢谢。现在,你能给我解释一下,我怎样才能找到第一次出现的链接吗?可能是在列表中进行解析,但我无法理解it@Jackey12345我更新了我的答案-你可以使用字典跟踪你发现的第一个10-Q/10-K出现的信息。非常感谢你这太棒了,它就像一个符咒!