Python 如何让代码在抓取网站时读取所有表？_Python_Beautifulsoup_Python Requests

Python 如何让代码在抓取网站时读取所有表？

python

Python 如何让代码在抓取网站时读取所有表？,python,beautifulsoup,python-requests,Python,Beautifulsoup,Python Requests,我是python的新手，这个网站在这学期帮了我很多忙，我希望你们能再次帮助我我需要把桌子从桌子上刮下来这些表格是最活跃的，赢家和输家现在我可以让这个代码为我工作了 import requests from bs4 import BeautifulSoup url = 'http://money.cnn.com/data/hotstocks/index.html' response = requests.get(url) html = re

我是python的新手，这个网站在这学期帮了我很多忙，我希望你们能再次帮助我

我需要把桌子从桌子上刮下来

这些表格是最活跃的，赢家和输家

现在我可以让这个代码为我工作了

     import requests
     from bs4 import BeautifulSoup

     url = 'http://money.cnn.com/data/hotstocks/index.html'
     response = requests.get(url)
     html = response.content

     soup = BeautifulSoup(html)

     all_stock = soup.find('div', attrs={'id':'wsod_hotStocks'})

     table = all_stock.find('table',attrs={'class':'wsod_dataTable wsod_dataTableBigAlt'  })

     for row in table.findAll('tr'):
         for cell in row.findAll('td'):
                 print(cell.text)

但这只会让我得到最活跃的表，我不确定我需要做什么才能让我的代码得到网站上的其他两个表

我将非常感谢任何关于我做错了什么以及如何纠正它的见解

我不知道我是否必须创建代码来刮除每个表，或者我是否可以调整我所拥有的

[这是网站上的HTML，你们可以了解我在做什么

实际上，您可以使用

pandas.read_html（）

，它将以良好的格式读取所有表

注意：它将以列表的形式返回表。因此，您可以以

DataFrame

的形式访问它，例如，索引为

print（df[0]）

移除以下部件

只需使用和更新

完整代码

只需要对现有代码做一个小的更改—使用find_all而不是find，并循环使用新的iterable

import requests
from bs4 import BeautifulSoup

url = 'http://money.cnn.com/data/hotstocks/index.html'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)

all_stock = soup.find('div', attrs={'id':'wsod_hotStocks'})

tables = all_stock.find_all('table',attrs={'class':'wsod_dataTable wsod_dataTableBigAlt'  })

for table in tables:
    print("Next_Table!!")
    for row in table.findAll('tr'):
        for cell in row.findAll('td'):
                print(cell.text)

您已经知道如何使用

.findAll

循环所有表行和表单元格-为什么不使用相同的方法循环所有表？我支持这一点。如果最终目标是通过pandas保存它们，最好从一开始就使用pandas。@rpanai是的，非常简单，如

df[0]。to_csv（“data.csv”，index=False）

噢，谢谢！！！有人告诉我我们可以使用熊猫，但教授没有教我们，所以我有点担心使用熊猫。问题，当我使用你提供的代码时，我去看CSV，只有第一个表。我需要添加更多的代码吗？@CarlaMaldonado如果你

打印（df）

所以您将获得我上面提到的所有内容，您可以在索引后以数据帧的形式访问每个表。@aԋ625; aҽaԃcαηohhhh我当然明白了。很抱歉，是的，非常感谢！！！！！！！！！

table = all_stock.find('table', attrs={'class': 'wsod_dataTable wsod_dataTableBigAlt'})

for row in all_stock.find_all('tr'):
    for cell in row.find_all('td'):
        print(cell.text)

import requests
from bs4 import BeautifulSoup

url = 'http://money.cnn.com/data/hotstocks/index.html'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html, features='html.parser')

all_stock = soup.find('div', attrs={'id': 'wsod_hotStocks'})

for row in all_stock.find_all('tr'):
    for cell in row.find_all('td'):
        print(cell.text)

import requests
from bs4 import BeautifulSoup

url = 'http://money.cnn.com/data/hotstocks/index.html'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)

all_stock = soup.find('div', attrs={'id':'wsod_hotStocks'})

tables = all_stock.find_all('table',attrs={'class':'wsod_dataTable wsod_dataTableBigAlt'  })

for table in tables:
    print("Next_Table!!")
    for row in table.findAll('tr'):
        for cell in row.findAll('td'):
                print(cell.text)