Python 使用BeautifulSoup提取span元素中的文本_Python_Beautifulsoup

Python 使用BeautifulSoup提取span元素中的文本

python

Python 使用BeautifulSoup提取span元素中的文本,python,beautifulsoup,Python,Beautifulsoup,这就是问题所在，我正在尝试使用BeautifulSoup从SEC数据库中提取一些数据，我对python是个新手，但我能够编写以下代码其想法是在一个.txt文件中使用一个报价符号列表，并提取每家公司的“CIK”编号以供进一步使用 import requests from bs4 import BeautifulSoup list_path = r"C:\Users\User1\Downloads\Quote list.txt" with open(list_path, "r") as flis

这就是问题所在，我正在尝试使用BeautifulSoup从SEC数据库中提取一些数据，我对python是个新手，但我能够编写以下代码

其想法是在一个.txt文件中使用一个报价符号列表，并提取每家公司的“CIK”编号以供进一步使用

import requests
from bs4 import BeautifulSoup
list_path = r"C:\Users\User1\Downloads\Quote list.txt"

with open(list_path, "r") as flist:
    for quote in flist:
        quote = quote.replace("\n", "")
        url = (r"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" + quote + 
        r"&type=10&dateb=&owner=exclude&count=100")
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        for company_info in soup.find_all("span", {"class" :"companyName"}):
            cik_code = company_info.string
            print(cik_code)

到目前为止，上面的代码为字符串“cik_code”打印“none”值。html中的元素如下所示：

<span class="companyName dm-selected dm-test">
      AAON INC 
      <acronym title="Central Index Key">CIK</acronym>
      #: 
      <a href="/cgi-bin/browse-edgar?
      action=getcompany&amp;CIK=0000824142&amp;owner=exclude&amp;count=100" 
      class="">0000824142 (see all company filings)</a>
</span>


安安公司

cik代码是最后一个编号：0000824142，就在“参见所有公司文件”之前

如何将该数字设置为字符串cik_code

我想您只需进入

标记中的

标记即可

for company_info in soup.find_all('span', {'class': 'companyName'}):
    cik_code = company_info.find_next('a').text.split(' ', maxsplit=1)[0]
    print(cik_code)

说明：

```
公司信息。查找下一步（'a'）
```
返回：

```
.split（“”，maxsplit=1）[0]
```
返回：

0000824142

虽然我不知道为什么，但这很有效。谢谢兄弟！只是添加了一个解释给你一个更好的想法。

  <a href="/cgi-bin/browse-edgar?
  action=getcompany&amp;CIK=0000824142&amp;owner=exclude&amp;count=100" 
  class="">0000824142 (see all company filings)</a>

0000824142 (see all company filings)