Python 获取特定值BeautifulSoup(解析)

Python 获取特定值BeautifulSoup(解析),python,python-3.x,parsing,beautifulsoup,Python,Python 3.x,Parsing,Beautifulsoup,我正试图从网站上提取信息 使用Python(BeautifulSoup) 我想提取以下数据(仅数字) EPS(基本版) 发件人: 从xml中: 我建立了以下代码: import pandas as pd from bs4 import BeautifulSoup import urllib.request as ur import request url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/i

我正试图从网站上提取信息

使用Python(BeautifulSoup

我想提取以下数据(仅数字)

EPS(基本版)

发件人:

从xml中:

我建立了以下代码:

import pandas as pd
from bs4 import BeautifulSoup
import urllib.request as ur
import request 

url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'


read_data = ur.urlopen(url_is).read()
soup_is=BeautifulSoup(read_data, 'lxml')
cells = soup_is.findAll('tr', {'class': 'mainRow'} )
for cell in cells:
  print(cell.text)
但我不打算提取每股收益(基本)的数字


有没有办法只提取数据并按列排序?

尝试下面的
css
选择器,检查td标记包含
EPS(Basic)
文本

import urllib.request as ur

url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'
read_data = ur.urlopen(url_is).read()
soup_is=BeautifulSoup(read_data, 'lxml')
row = soup_is.select_one('tr.mainRow>td.rowTitle:contains("EPS (Basic)")')
print([cell.text for cell in row.parent.select('td') if cell.text!=''])
输出

[' EPS (Basic)', '2.47', '2.20', '3.05', '5.04', '2.58']
              0     1     2     3     4     5
0   EPS (Basic)  2.47  2.20  3.05  5.04  2.58

用DF打印

import pandas as pd
from bs4 import BeautifulSoup
import urllib.request as ur

url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'
read_data = ur.urlopen(url_is).read()
soup_is=BeautifulSoup(read_data, 'lxml')
row = soup_is.select_one('tr.mainRow>td.rowTitle:contains("EPS (Basic)")')
data=[cell.text for cell in row.parent.select('td') if cell.text!='']
df=pd.DataFrame(data)
print(df.T)
输出

[' EPS (Basic)', '2.47', '2.20', '3.05', '5.04', '2.58']
              0     1     2     3     4     5
0   EPS (Basic)  2.47  2.20  3.05  5.04  2.58