Python 如何制作一个简单的网页刮板并将信息导出到电子表格中?

Python 如何制作一个简单的网页刮板并将信息导出到电子表格中?,python,web-scraping,Python,Web Scraping,我想为这个网站制作一个网页刮板: 它有一个带有我想要的信息的。对于每一行,我想得到赔率数字,然后根据团队从预测列中的平均保证金数字中减去或相加。然后将该编号与其中一个团队名称一起存储 这似乎是一个非常简单的网页抓取程序,但我没有这方面的经验,希望得到一些建议。很多教程都使用Python和Beautiful Soup,所以我想我会使用它,但我不确定如何将信息存储到电子表格之类的东西中。谢谢 你是对的,你应该用漂亮的汤来提取数据。只需将其放入带有pandas的数据框中,即可将其放入电子表格中 imp

我想为这个网站制作一个网页刮板:

它有一个带有我想要的信息的。对于每一行,我想得到赔率数字,然后根据团队从预测列中的平均保证金数字中减去或相加。然后将该编号与其中一个团队名称一起存储


这似乎是一个非常简单的网页抓取程序,但我没有这方面的经验,希望得到一些建议。很多教程都使用Python和Beautiful Soup,所以我想我会使用它,但我不确定如何将信息存储到电子表格之类的东西中。谢谢

你是对的,你应该用漂亮的汤来提取数据。只需将其放入带有
pandas
的数据框中,即可将其放入电子表格中

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.ncaagamesim.com/college-basketball-predictions.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')

# Get column names
headers = table.find_all('th')
cols = [ x.text for x in headers ]

# Get all rows in table body
table_rows = table.find_all('tr')

rows = []
# Grab the text of each td, and put into a rows list
for each in table_rows[1:]:
    odd_avail = True
    data = each.find_all('td')
    time = data[0].text.strip()
    try:
        matchup, odds = data[1].text.strip().split('\xa0')
        odd_margin = float(odds.split('by')[-1].strip())
    except:
        matchup = data[1].text.strip()
        odd_margin = '-'
        odd_avail = False
    odd_team_win = data[1].find_all('img')[-1]['title']
    
    
    sim_team_win = data[2].find('img')['title']
    sim_margin = float(re.findall("\d+\.\d+", data[2].text)[-1])
    
    if odd_avail == True:
        if odd_team_win == sim_team_win:
            diff = sim_margin - odd_margin
        else:
            diff = -1*odd_margin - sim_margin 
    else:
        diff = '-'
            
    
    
    
    row = {cols[0]:time, 'Matchup':matchup, 'Odds Winner':odd_team_win, 'Odds':odd_margin, 'Simulation Winner':sim_team_win, 'Simulation Margin':sim_margin, 'Diff':diff}
    rows.append(row)
        
df = pd.DataFrame(rows)   
df.to_csv('odds.csv', index=False)
输出:

print (df.to_string())
       Time                                     Matchup       Odds Winner  Odds Simulation Winner  Simulation Margin  Diff
0      2 PM                Buffalo  @ Western Michigan            Buffalo   9.5           Buffalo                7.3  -2.2
1      3 PM                 Akron  @ Northern Illinois              Akron     9             Akron                6.5  -2.5
2   4:30 PM             Kent State  @ Central Michigan         Kent State     6        Kent State                8.8   2.8
3      5 PM                       St. Katherine  @ UNLV              UNLV     -              UNLV               37.0     -
4   5:30 PM  Alabama State  @ Mississippi Valley State      Alabama State   6.5     Alabama State                5.9  -0.6
5      7 PM                Wisconsin (5) @ Michigan (4)          Michigan   3.5         Wisconsin                1.2  -4.7
6      7 PM       Eastern Illinois  @ SIU Edwardsville   Eastern Illinois     6  Eastern Illinois                7.4   1.4
7      7 PM                       Butler  @ St. John's         St. John's     2        St. John's                7.5   5.5
8      7 PM                 Saint Joseph's  @ Davidson           Davidson  12.5          Davidson               14.8   2.3
9      7 PM                        Ole Miss  @ Florida            Florida   3.5           Florida                8.5     5
10     7 PM                Ball State  @ Bowling Green      Bowling Green   7.5     Bowling Green                2.7  -4.8
11     7 PM                       Miami (Ohio)  @ Ohio               Ohio   8.5              Ohio                8.0  -0.5
12     7 PM                 Eastern Michigan  @ Toledo             Toledo    11            Toledo               10.6  -0.4
13     7 PM                    Miami  @ Boston College              Miami     3    Boston College                4.9  -7.9
14     7 PM                      Duke  @ Virginia Tech               Duke   1.5     Virginia Tech                8.2  -9.7
15  7:30 PM                            TCU  @ Oklahoma           Oklahoma     8          Oklahoma                8.3   0.3
16     8 PM               Kansas (22) @ Oklahoma State             Kansas   3.5            Kansas                0.7  -2.8
17  8:30 PM            Alcorn State  @ Grambling State    Grambling State   7.5      Alcorn State                2.8 -10.3
18     9 PM                            Syracuse  @ UNC                UNC   3.5               UNC                2.6  -0.9
19     9 PM                    Providence  @ Marquette          Marquette     3         Marquette                9.0     6
20     9 PM                        Alabama  @ Kentucky           Kentucky     2           Alabama                4.0    -6
21     9 PM                    UC Riverside  @ USC (12)               USC  14.5               USC               14.3  -0.2

帮助我们帮助您-请改进您的问题,以便我们能够更好地再现您的问题。谢谢---如果你能用你的编码方法,如果你是这个话题的新手,那就太酷了。非常感谢你!这正是我要找的。