使用Python从Web读取表_Python_Beautifulsoup

使用Python从Web读取表

python

使用Python从Web读取表,python,beautifulsoup,Python,Beautifulsoup,我是Python新手，正在努力从特定表（最后一个表是持股模式）上的网站提取数据我正在使用BeautifulSoup library进行此操作，但我不知道如何进行此操作到目前为止，下面是我的代码片段。我无法选择正确的表，因为页面有多个表，并且所有表都共享公共类和ID，这使得我很难筛选出我想要的一个表 import requests import urllib.request from bs4 import BeautifulSoup url = "https://www.s

我是Python新手，正在努力从特定表（最后一个表是持股模式）上的网站提取数据

我正在使用BeautifulSoup library进行此操作，但我不知道如何进行此操作

到目前为止，下面是我的代码片段。我无法选择正确的表，因为页面有多个表，并且所有表都共享公共类和ID，这使得我很难筛选出我想要的一个表

import requests import urllib.request
from bs4 import BeautifulSoup
    
url = "https://www.screener.in/company/ABB/consolidated/"

r = requests.get(url)
print(r.status_code)
html_content = r.text
soup = BeautifulSoup(html_content,"html.parser")
# print(soup)
#data_table = soup.find('table', class_ = "data-table")
# print(data_table) table_needed = soup.find("<h2>ShareholdingPattern</h2>")
#sub = table_needed.contents[0] print(table_needed)

导入请求导入urllib.request 从bs4导入BeautifulSoup url=”https://www.screener.in/company/ABB/consolidated/" r=请求。获取（url）打印（r.状态\ U代码） html\u content=r.text soup=BeautifulSoup（html\u内容，“html.parser”） #印花（汤） #data\u table=soup.find（'table'，class=“data table”） #打印（数据表格）表格所需=soup.find（“ShareholdingPattern”） #sub=需要表格。内容[0]打印（需要表格）

只需使用

请求

和

熊猫

。抓取最后一个表并将其转储到

.csv

文件中

以下是方法：

import pandas as pd
import requests

df = pd.read_html(
    requests.get("https://www.screener.in/company/ABB/consolidated/").text,
    flavor="bs4",
)
df[-1].to_csv("last_table.csv", index=False)

从

.csv

文件输出：

亲爱的巴杜克，感谢您的回答，感谢您以如此出色的方式使用熊猫的好主意。！很高兴看到大熊猫在这里得到如此广泛的应用。非常感谢你出色的工作。谢谢你，这个解决方案有效，但你能解释一下它是如何工作的吗。我看不到任何地方指定了表，但它提取了我想要的确切表@badukerSince您想要的最后一个表

df[-1]

使用

索引

从数组末尾抓取第一个表。那是你的桌子。@baduker我需要另一个帮助。如何选择特定列（即最后2列）。我添加了index_col，但它不起作用。df=pd.read\u html（requests.get（，index\u col=-2，flavor=“bs4”），您可以导入熊猫并使用

read\u html（requests.get（“Yourwebsite.com”）.text）