Python 根据通过BeautifulSoup找到的元素在Pandas中插入新列

Python 根据通过BeautifulSoup找到的元素在Pandas中插入新列,python,html,pandas,beautifulsoup,Python,Html,Pandas,Beautifulsoup,我有一个由表组成的数据框架。表的数量可以在3到6之间,并且每天都在变化。我已经删去了细节,但下面是HTML的样子。有一个名为“list title”的表标题和一个名为“list2”的表行列表,我已经通过pandas成功地提取了它们 <span id="list-title"> ABC(11111) <br> </span <table class="list2"> <tbody> <tr class = "bg3" </t

我有一个由表组成的数据框架。表的数量可以在3到6之间,并且每天都在变化。我已经删去了细节,但下面是HTML的样子。有一个名为“list title”的表标题和一个名为“list2”的表行列表,我已经通过pandas成功地提取了它们


<span id="list-title">
ABC(11111)
<br>
</span

<table class="list2">
<tbody>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
</tbody>
</table>

<span id="list-title">
DEF(22222)
<br>
</span>

<table class="list2">
<tbody>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
</tbody>
</table>

<span id="list-title">
XYZ(33333)
<br>
</span>

<table class="list2">
<tbody>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
  <tr class = "bg3" </tr>
</tbody>
</table>
我需要做的是创建一个新列并从每个表的“列表标题”导出信息


broker = soup.find('div', id='mainContents').find_all(id='list-title')[1]
broker = broker.get_text()
df = pd.concat([df_tmp1, df_tmp2], axis=1)
df.insert(5, 'New_Column', broker)

上面的代码所做的是为所有表(而不是每个表)添加第二个标题作为“代理”。我尝试过使用for循环,例如

for i in range(3):
    df = df[i] = pd.io.html.read_html(filename, encoding='Shift JIS', attrs={'class': 'list2'})
    broker = soup.find('div', id='mainContents').find_all(id='list-title')[i]
    broker = broker.get_text()
    df.insert(5, 'New_Column, broker)

但这会导致一个错误。我想我需要在连接表之前插入列,但不知道如何插入。

看起来您只需要循环导入的表和代理,并在连接之前将代理列添加到每个数据帧:

tables = pd.io.html.read_html(filename, encoding='Shift JIS', attrs={'class':'list2'})
brokers = soup.find('div', id='mainContents').find_all(id='list-title')

for (t, b in zip(tables, brokers)):
    t['broker'] = b.get_text()

df = pd.concat(df)

希望这对你有用。

这是打字错误吗<代码>df.插入(5,'新列,代理)?您缺少
'New\u Column
->
df.insert(5,'New\u Column',broker)
你好,艾略特,非常感谢!那正是我要找的!我以前从未使用过zip,但很高兴知道我可以用这个命令减少这么多代码。再次感谢你。
tables = pd.io.html.read_html(filename, encoding='Shift JIS', attrs={'class':'list2'})
brokers = soup.find('div', id='mainContents').find_all(id='list-title')

for (t, b in zip(tables, brokers)):
    t['broker'] = b.get_text()

df = pd.concat(df)