Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用BeautifulSoup从HTML表中删除一列_Python_Beautifulsoup - Fatal编程技术网

Python 使用BeautifulSoup从HTML表中删除一列

Python 使用BeautifulSoup从HTML表中删除一列,python,beautifulsoup,Python,Beautifulsoup,我知道在BeautifulSoup上有很多问题,但是在尝试了一些事情之后,我无法理解如何解析这个HTML表中需要的数据 我的桌子看起来像这样: <table class="W(100%) M(0)" data-test="historical-prices" data-reactid="33"> <thead data-reactid="34"> <tr class="C($tertiaryColor) Fz(xs) Ta(end)" dat

我知道在BeautifulSoup上有很多问题,但是在尝试了一些事情之后,我无法理解如何解析这个HTML表中需要的数据

我的桌子看起来像这样:

<table class="W(100%) M(0)" data-test="historical-prices" data-reactid="33">
    <thead data-reactid="34">
        <tr class="C($tertiaryColor) Fz(xs) Ta(end)" data-reactid="35">
            <th class="Ta(start) W(100px) Fw(400) Py(6px)" data-reactid="36"><span data-reactid="37">Date</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="38"><span data-reactid="39">Open</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="40"><span data-reactid="41">High</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="42"><span data-reactid="43">Low</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="44"><span data-reactid="45">Close*</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="46"><span data-reactid="47">Adj Close**</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="48"><span data-reactid="49">Volume</span></th>
        </tr>
    </thead>
    <tbody data-reactid="50">
        <tr class="BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)" data-reactid="51">
            <td class="Py(10px) Ta(start) Pend(10px)" data-reactid="52"><span data-reactid="53">Oct 10, 2019</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="54"><span data-reactid="55">2,918.55</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="56"><span data-reactid="57">2,948.46</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="58"><span data-reactid="59">2,917.12</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="60"><span data-reactid="61">2,938.13</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="62"><span data-reactid="63">2,938.13</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="64"><span data-reactid="65">3,217,250,000</span></td>
        </tr>
        <tr class="BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)" data-reactid="66">
            <td class="Py(10px) Ta(start) Pend(10px)" data-reactid="67"><span data-reactid="68">Oct 09, 2019</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="69"><span data-reactid="70">2,911.10</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="71"><span data-reactid="72">2,929.32</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="73"><span data-reactid="74">2,907.41</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="75"><span data-reactid="76">2,919.40</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="77"><span data-reactid="78">2,919.40</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="79"><span data-reactid="80">2,726,820,000</span></td>
        </tr>
</table>

您可以将HTML转换为字典列表,以便快速查找:

header, *data = [[i.text for i in b.find_all('th' if not b.td else 'td')] for b in d.find_all('tr')]
result = [dict(zip(header, i)) for i in data]
vals = [i['Adj Close**'] for i in result]
或者,使用
熊猫

import pandas as pd
df = pd.DataFrame(result)
vals = df['Adj Close**']
输出:

0    2,938.13
1    2,919.40
Name: Adj Close**, dtype: object
您可以使用类型的第n个(如果您知道索引,那么请指定direct,或者我演示如何基于标题获取它)。使用bs4.7.1+

from bs4 import BeautifulSoup as bs

html = '''<table class="W(100%) M(0)" data-test="historical-prices" data-reactid="33">
    <thead data-reactid="34">
        <tr class="C($tertiaryColor) Fz(xs) Ta(end)" data-reactid="35">
            <th class="Ta(start) W(100px) Fw(400) Py(6px)" data-reactid="36"><span data-reactid="37">Date</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="38"><span data-reactid="39">Open</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="40"><span data-reactid="41">High</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="42"><span data-reactid="43">Low</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="44"><span data-reactid="45">Close*</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="46"><span data-reactid="47">Adj Close**</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="48"><span data-reactid="49">Volume</span></th>
        </tr>
    </thead>
    <tbody data-reactid="50">
        <tr class="BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)" data-reactid="51">
            <td class="Py(10px) Ta(start) Pend(10px)" data-reactid="52"><span data-reactid="53">Oct 10, 2019</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="54"><span data-reactid="55">2,918.55</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="56"><span data-reactid="57">2,948.46</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="58"><span data-reactid="59">2,917.12</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="60"><span data-reactid="61">2,938.13</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="62"><span data-reactid="63">2,938.13</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="64"><span data-reactid="65">3,217,250,000</span></td>
        </tr>
        <tr class="BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)" data-reactid="66">
            <td class="Py(10px) Ta(start) Pend(10px)" data-reactid="67"><span data-reactid="68">Oct 09, 2019</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="69"><span data-reactid="70">2,911.10</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="71"><span data-reactid="72">2,929.32</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="73"><span data-reactid="74">2,907.41</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="75"><span data-reactid="76">2,919.40</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="77"><span data-reactid="78">2,919.40</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="79"><span data-reactid="80">2,726,820,000</span></td>
        </tr>
</table>'''
soup = bs(html, 'lxml')
index = [th.text for th in soup.select('[data-test="historical-prices"] th')].index('Adj Close**') + 1
data = [td.text for td in soup.select(f'[data-test="historical-prices"] td:nth-of-type({index})')]
print(data)
从bs4导入美化组作为bs
html=“”
日期
打开
高
低
接近*
关闭**
卷
2019年10月10日
2,918.55
2,948.46
2,917.12
2,938.13
2,938.13
3,217,250,000
2019年10月9日
2,911.10
2,929.32
2,907.41
2,919.40
2,919.40
2,726,820,000
'''
soup=bs(html,“lxml”)
index=[th.text代表汤中的th.select('[data test=“historical prices”]th')。index('Adj Close**')+1
data=[td.text表示汤中的td.select(f'[data test=“historical prices”]td:n类型({index})]
打印(数据)
from bs4 import BeautifulSoup as bs

html = '''<table class="W(100%) M(0)" data-test="historical-prices" data-reactid="33">
    <thead data-reactid="34">
        <tr class="C($tertiaryColor) Fz(xs) Ta(end)" data-reactid="35">
            <th class="Ta(start) W(100px) Fw(400) Py(6px)" data-reactid="36"><span data-reactid="37">Date</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="38"><span data-reactid="39">Open</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="40"><span data-reactid="41">High</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="42"><span data-reactid="43">Low</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="44"><span data-reactid="45">Close*</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="46"><span data-reactid="47">Adj Close**</span></th>
            <th class="Fw(400) Py(6px)" data-reactid="48"><span data-reactid="49">Volume</span></th>
        </tr>
    </thead>
    <tbody data-reactid="50">
        <tr class="BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)" data-reactid="51">
            <td class="Py(10px) Ta(start) Pend(10px)" data-reactid="52"><span data-reactid="53">Oct 10, 2019</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="54"><span data-reactid="55">2,918.55</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="56"><span data-reactid="57">2,948.46</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="58"><span data-reactid="59">2,917.12</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="60"><span data-reactid="61">2,938.13</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="62"><span data-reactid="63">2,938.13</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="64"><span data-reactid="65">3,217,250,000</span></td>
        </tr>
        <tr class="BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)" data-reactid="66">
            <td class="Py(10px) Ta(start) Pend(10px)" data-reactid="67"><span data-reactid="68">Oct 09, 2019</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="69"><span data-reactid="70">2,911.10</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="71"><span data-reactid="72">2,929.32</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="73"><span data-reactid="74">2,907.41</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="75"><span data-reactid="76">2,919.40</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="77"><span data-reactid="78">2,919.40</span></td>
            <td class="Py(10px) Pstart(10px)" data-reactid="79"><span data-reactid="80">2,726,820,000</span></td>
        </tr>
</table>'''
soup = bs(html, 'lxml')
index = [th.text for th in soup.select('[data-test="historical-prices"] th')].index('Adj Close**') + 1
data = [td.text for td in soup.select(f'[data-test="historical-prices"] td:nth-of-type({index})')]
print(data)