Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 如何在html表中进行刮取?_Python 3.x_Html Table_Beautifulsoup_Yahoo Finance - Fatal编程技术网

Python 3.x 如何在html表中进行刮取?

Python 3.x 如何在html表中进行刮取?,python-3.x,html-table,beautifulsoup,yahoo-finance,Python 3.x,Html Table,Beautifulsoup,Yahoo Finance,我想从雅虎财经获取文本数据。对于任何一台股票,我都希望能够勉强获得总收入: soup.find('span', string='Total Revenue').find_next().text 关于上一个示例,请参见欧莱雅页面。关联的目标代码是: <div class="rw-expnded" data-reactid="44" data-test="fin-row"> &l

我想从雅虎财经获取文本数据。对于任何一台股票,我都希望能够勉强获得总收入:

soup.find('span', string='Total Revenue').find_next().text

关于上一个示例,请参见欧莱雅页面。关联的目标代码是:

                         <div class="rw-expnded" data-reactid="44" data-test="fin-row">
                            <div class="D(tbr) fi-row Bgc($hoverBgColor):h" data-reactid="45">
                               <div class="D(tbc) Ta(start) Pend(15px)--mv2 Pend(10px) Bxz(bb) Py(8px) Bdends(s) Bdbs(s) Bdstarts(s) Bdstartw(1px) Bdbw(1px) Bdendw(1px) Bdc($seperatorColor) Pos(st) Start(0) Bgc($lv2BgColor) fi-row:h_Bgc($hoverBgColor) Pstart(15px)--mv2 Pstart(10px)" data-reactid="46">
                                  <div class="D(ib) Va(m) Ell Mt(-3px) W(215px)--mv2 W(200px)" data-reactid="47" title="Total Revenue"><span class="Va(m)" data-reactid="48">Total Revenue</span></div>
                                  <div class="W(3px) Pos(a) Start(100%) T(0) H(100%) Bg($pfColumnFakeShadowGradient) Pe(n) Pend(5px)" data-reactid="49"></div>
                               </div>
                               <div class="D(tbc) Ta(c) Pstart(6px) Pend(4px) Bxz(bb) Py(8px) BdB Bdc($seperatorColor) Miw(100px) Miw(156px)--pnclg Bgc($lv1BgColor) fi-row:h_Bgc($hoverBgColor)" data-reactid="50" data-test="fin-col"><span data-reactid="51">29,873,600</span></div>
                               <div class="D(tbc) Ta(c) Pstart(6px) Pend(4px) Bxz(bb) Py(8px) BdB Bdc($seperatorColor) Miw(100px) Miw(156px)--pnclg" data-reactid="52" data-test="fin-col"><span data-reactid="53">29,873,600</span></div>
                               <div class="D(tbc) Ta(c) Pstart(6px) Pend(4px) Bxz(bb) Py(8px) BdB Bdc($seperatorColor) Miw(100px) Miw(156px)--pnclg Bgc($lv1BgColor) fi-row:h_Bgc($hoverBgColor)" data-reactid="54" data-test="fin-col"><span data-reactid="55">26,937,400</span></div>
                               <div class="D(tbc) Ta(c) Pstart(6px) Pend(4px) Bxz(bb) Py(8px) BdB Bdc($seperatorColor) Miw(100px) Miw(156px)--pnclg" data-reactid="56" data-test="fin-col"><span data-reactid="57">26,023,700</span></div>
                               <div class="D(tbc) Ta(c) Pstart(6px) Pend(4px) Bxz(bb) Py(8px) BdB Bdc($seperatorColor) Miw(100px) Miw(156px)--pnclg Pend(10px) Bgc($lv1BgColor) fi-row:h_Bgc($hoverBgColor)" data-reactid="58" data-test="fin-col"><span data-reactid="59">25,837,100</span></div>
                            </div>
                            <div class="D(b)" data-reactid="60"></div>
                         </div>
然后我尝试在总收入之后获取项目:

soup.find('span', string='Total Revenue').find_next().text
但它返回
'


如果有一个更简单的方法,比如说如果有一个API,请毫不犹豫地告诉我。我尝试了yfinance,但它对非美国股票的财务报表不起作用。

你只是得到了下一个
div
。您必须获得下一个
span
。将标记的名称作为参数

使用以下命令:

soup.find('span', string='Total Revenue').find_next('span').text
代码


您没有指定要查找下一个的正确标记

#This will not run on online IDE 
import requests 
from bs4 import BeautifulSoup 

ticker = 'OR.SA'
URL = "https://finance.yahoo.com/quote/OR.PA/financials?p={ticker}"
r = requests.get(URL) 

soup = BeautifulSoup(r.content, 'lxml')

total_rev_tag = soup.find('div', {'title':"Total Revenue"}).parent.next_sibling
total_rev_txt_1 = total_rev_tag.find('span').text
total_rev_txt_2 = total_rev_tag.find_next().text
print('total_rev_txt_1',total_rev_txt_1)
print('total_rev_txt_2',total_rev_txt_2)

total_rev_txt_3 = soup.find('span', string='Total Revenue').find_next('span').text
print('total_rev_txt_3',total_rev_txt_3)
输出:

total_rev_txt_1 29,873,600
total_rev_txt_2 29,873,600
total_rev_txt_3 29,873,600
你试过了吗?安装后,使用以下内容获取总收入(以及所有损益表项目):Ticker('OR.PA')。损益表()
#This will not run on online IDE 
import requests 
from bs4 import BeautifulSoup 

ticker = 'OR.SA'
URL = "https://finance.yahoo.com/quote/OR.PA/financials?p={ticker}"
r = requests.get(URL) 

soup = BeautifulSoup(r.content, 'lxml')

total_rev_tag = soup.find('div', {'title':"Total Revenue"}).parent.next_sibling
total_rev_txt_1 = total_rev_tag.find('span').text
total_rev_txt_2 = total_rev_tag.find_next().text
print('total_rev_txt_1',total_rev_txt_1)
print('total_rev_txt_2',total_rev_txt_2)

total_rev_txt_3 = soup.find('span', string='Total Revenue').find_next('span').text
print('total_rev_txt_3',total_rev_txt_3)
total_rev_txt_1 29,873,600
total_rev_txt_2 29,873,600
total_rev_txt_3 29,873,600