使用Python中的BeautifulSoup将每日表格抓取到CSV中

使用Python中的BeautifulSoup将每日表格抓取到CSV中,python,csv,beautifulsoup,Python,Csv,Beautifulsoup,我是python新手,正在寻找一些关于BeautifulSoup的帮助。我试图从网络上抓取一些棒球数据,并将数据存储到CSV文件中。我想通过循环URL中的每个日历日期来获取每天玩游戏的数据 我相信可能会有一些错误,但到目前为止,我所做的是这样的: import csv import urllib2 from bs4 import BeautifulSoup with open('covers.csv', 'wb') as f: writer = csv.writer(f) fo

我是python新手,正在寻找一些关于BeautifulSoup的帮助。我试图从网络上抓取一些棒球数据,并将数据存储到CSV文件中。我想通过循环URL中的每个日历日期来获取每天玩游戏的数据

我相信可能会有一些错误,但到目前为止,我所做的是这样的:

import csv
import urllib2
from bs4 import BeautifulSoup

with open('covers.csv', 'wb') as f:
    writer = csv.writer(f)
    for i in range(31):
        #I'd like to loop through actual dates instead of my 'i' here
        url = "http://contests.covers.com/Handicapping/consensusPick/daily-consensus-picks.aspx?sport=5&date=5/{}/2014".format(i)
        u = urllib2.urlopen(url)
        try:
            html = u.read()
        finally:
            u.close()
        soup=BeautifulSoup(html)
        for mytable in soup.find_all(class="thepicks")
            for trs in mytable.find_all('tr')
                tds = trs.find_all('td')
                row = [elem.text.encode('utf-8') for elem in tds]
                writer.writerow(row)
结果

['Time', 'Away', 'Line', 'Picks', 'Pct', 'Home', 'Line', 'Picks', 'Pct', 'Detail', 'Odds']
['7:15 PM', 'Miami', '+133', '388', '29.02%', 'St. Louis', '-144', '949', '70.98%', 'View', 'View']
['7:08 PM', 'Tampa Bay', '+106', '444', '31.76%', 'Detroit', '-115', '954', '68.24%', 'View', 'View']
['7:35 PM', 'Arizona', '+145', '439', '33.13%', 'Atlanta', '-157', '886', '66.87%', 'View', 'View']
['5:05 PM', 'Philadelphia', '+180', '432', '34.70%', 'Pittsburgh', '-196', '813', '65.30%', 'View', 'View']
['9:05 PM', 'Houston', '+165', '507', '37.56%', 'LA Angels', '-179', '843', '62.44%', 'View', 'View']
['11:05 AM', 'Chi. Cubs', '+141', '388', '40.42%', 'Washington', '-153', '572', '59.58%', 'View', 'View']
['4:05 PM', 'Toronto', '+114', '541', '40.89%', 'Oakland', '-123', '782', '59.11%', 'View', 'View']
['7:10 PM', 'Seattle', '+161', '599', '45.14%', 'Chi. White Sox', '-175', '728', '54.86%', 'View', 'View']
['7:10 PM', 'Milwaukee', '+102', '614', '46.80%', 'Cincinnati', '-110', '698', '53.20%', 'View', 'View']
['3:10 PM', 'NY Yankees', '+100', '630', '50.28%', 'Minnesota', '-108', '623', '49.72%', 'View', 'View']
['7:05 PM', 'Kansas City', '+103', '706', '55.50%', 'Cleveland', '-111', '566', '44.50%', 'View', 'View']
['6:40 PM', 'San Francisco', '-108', '827', '60.63%', 'San Diego', '+100', '537', '39.37%', 'View', 'View']
['7:10 PM', 'Texas', '-153', '916', '67.60%', 'NY Mets', '+141', '439', '32.40%', 'View', 'View']
['8:10 PM', 'LA Dodgers', '-215', '946', '69.41%', 'Colorado', '+197', '417', '30.59%', 'View', 'View']
['Time', 'Away', 'Total', 'Home', 'Over', 'Pct', 'Under', 'Pct', 'Detail', 'Odds']
['7:10 PM', 'Seattle', '7.5', 'Chi. White Sox', '299', '38.93%', '469', '61.07%', 'View', 'View']
['8:10 PM', 'LA Dodgers', '9.5', 'Colorado', '230', '43.15%', '303', '56.85%', 'View', 'View']
['7:10 PM', 'Milwaukee', '7.5', 'Cincinnati', '373', '47.40%', '414', '52.60%', 'View', 'View']
['7:15 PM', 'Miami', '7.5', 'St. Louis', '360', '48.19%', '387', '51.81%', 'View', 'View']
['11:05 AM', 'Chi. Cubs', '7', 'Washington', '257', '48.22%', '276', '51.78%', 'View', 'View']
['7:35 PM', 'Arizona', '7.0', 'Atlanta', '379', '50.40%', '373', '49.60%', 'View', 'View']
['4:05 PM', 'Toronto', '8', 'Oakland', '392', '52.34%', '357', '47.66%', 'View', 'View']
['7:08 PM', 'Tampa Bay', '8', 'Detroit', '421', '54.89%', '346', '45.11%', 'View', 'View']
['3:10 PM', 'NY Yankees', '8', 'Minnesota', '402', '55.76%', '319', '44.24%', 'View', 'View']
['5:05 PM', 'Philadelphia', '7.5', 'Pittsburgh', '426', '57.26%', '318', '42.74%', 'View', 'View']
['7:10 PM', 'Texas', '6.5', 'NY Mets', '278', '57.68%', '204', '42.32%', 'View', 'View']
['6:40 PM', 'San Francisco', '7', 'San Diego', '482', '58.14%', '347', '41.86%', 'View', 'View']
['9:05 PM', 'Houston', '8', 'LA Angels', '478', '58.51%', '339', '41.49%', 'View', 'View']
['7:05 PM', 'Kansas City', '7.5', 'Cleveland', '461', '60.58%', '300', '39.42%', 'View', 'View']
如果需要添加列:

row = [elem.text.strip().encode('utf-8') for elem in tds]
row.append("7/4/2014")
如果需要修改现有列:
(例如,删除带有文本
视图的列

结果

['Time', 'Away', 'Line', 'Picks', 'Pct', 'Home', 'Line', 'Picks', 'Pct', 'Detail', 'Odds']
['7:15 PM', 'Miami', '+133', '388', '29.02%', 'St. Louis', '-144', '949', '70.98%', 'View', 'View']
['7:08 PM', 'Tampa Bay', '+106', '444', '31.76%', 'Detroit', '-115', '954', '68.24%', 'View', 'View']
['7:35 PM', 'Arizona', '+145', '439', '33.13%', 'Atlanta', '-157', '886', '66.87%', 'View', 'View']
['5:05 PM', 'Philadelphia', '+180', '432', '34.70%', 'Pittsburgh', '-196', '813', '65.30%', 'View', 'View']
['9:05 PM', 'Houston', '+165', '507', '37.56%', 'LA Angels', '-179', '843', '62.44%', 'View', 'View']
['11:05 AM', 'Chi. Cubs', '+141', '388', '40.42%', 'Washington', '-153', '572', '59.58%', 'View', 'View']
['4:05 PM', 'Toronto', '+114', '541', '40.89%', 'Oakland', '-123', '782', '59.11%', 'View', 'View']
['7:10 PM', 'Seattle', '+161', '599', '45.14%', 'Chi. White Sox', '-175', '728', '54.86%', 'View', 'View']
['7:10 PM', 'Milwaukee', '+102', '614', '46.80%', 'Cincinnati', '-110', '698', '53.20%', 'View', 'View']
['3:10 PM', 'NY Yankees', '+100', '630', '50.28%', 'Minnesota', '-108', '623', '49.72%', 'View', 'View']
['7:05 PM', 'Kansas City', '+103', '706', '55.50%', 'Cleveland', '-111', '566', '44.50%', 'View', 'View']
['6:40 PM', 'San Francisco', '-108', '827', '60.63%', 'San Diego', '+100', '537', '39.37%', 'View', 'View']
['7:10 PM', 'Texas', '-153', '916', '67.60%', 'NY Mets', '+141', '439', '32.40%', 'View', 'View']
['8:10 PM', 'LA Dodgers', '-215', '946', '69.41%', 'Colorado', '+197', '417', '30.59%', 'View', 'View']
['Time', 'Away', 'Total', 'Home', 'Over', 'Pct', 'Under', 'Pct', 'Detail', 'Odds']
['7:10 PM', 'Seattle', '7.5', 'Chi. White Sox', '299', '38.93%', '469', '61.07%', 'View', 'View']
['8:10 PM', 'LA Dodgers', '9.5', 'Colorado', '230', '43.15%', '303', '56.85%', 'View', 'View']
['7:10 PM', 'Milwaukee', '7.5', 'Cincinnati', '373', '47.40%', '414', '52.60%', 'View', 'View']
['7:15 PM', 'Miami', '7.5', 'St. Louis', '360', '48.19%', '387', '51.81%', 'View', 'View']
['11:05 AM', 'Chi. Cubs', '7', 'Washington', '257', '48.22%', '276', '51.78%', 'View', 'View']
['7:35 PM', 'Arizona', '7.0', 'Atlanta', '379', '50.40%', '373', '49.60%', 'View', 'View']
['4:05 PM', 'Toronto', '8', 'Oakland', '392', '52.34%', '357', '47.66%', 'View', 'View']
['7:08 PM', 'Tampa Bay', '8', 'Detroit', '421', '54.89%', '346', '45.11%', 'View', 'View']
['3:10 PM', 'NY Yankees', '8', 'Minnesota', '402', '55.76%', '319', '44.24%', 'View', 'View']
['5:05 PM', 'Philadelphia', '7.5', 'Pittsburgh', '426', '57.26%', '318', '42.74%', 'View', 'View']
['7:10 PM', 'Texas', '6.5', 'NY Mets', '278', '57.68%', '204', '42.32%', 'View', 'View']
['6:40 PM', 'San Francisco', '7', 'San Diego', '482', '58.14%', '347', '41.86%', 'View', 'View']
['9:05 PM', 'Houston', '8', 'LA Angels', '478', '58.51%', '339', '41.49%', 'View', 'View']
['7:05 PM', 'Kansas City', '7.5', 'Cleveland', '461', '60.58%', '300', '39.42%', 'View', 'View']
如果需要添加列:

row = [elem.text.strip().encode('utf-8') for elem in tds]
row.append("7/4/2014")
如果需要修改现有列:
(例如,删除带有文本
视图的列


使用
datetime.datetime.now()+datetime.timedelta(1)
获取tommorow日期。谢谢,我将尝试实现这一点。此外,并非所有的td都被删除。它只从标记的确切位置抓取数据,但如果我建议使用和构建URL,例如
urlparse.urlusplit(('http','scents.covers.com','disabicing/consensespick/daily consensess picks.aspx',urllib.urlencode({sport':5,'date':'7/4/2014'}),''),它就不会抓取数据
。使用
datetime.datetime.now()+datetime.timedelta(1)
获取tommorow日期。谢谢,我将尝试实现此功能。此外,并不是所有的td都被删除。它只从标记的确切位置抓取数据,但如果我建议使用和构建URL,例如
urlparse.urlusplit(('http','scents.covers.com','disabicing/consensespick/daily consensess picks.aspx',urllib.urlencode({sport':5,'date':'7/4/2014'}),''),它就不会抓取数据
。现在,如果我能想出如何将文本添加到相邻的列中,以获取包含比赛举行日期的每一行的数据。。。嗯:)
row.append(“7/4/2014”)
?回答得非常好@user3808992请将其标记为正确。现在,如果我能想出如何将文本添加到相邻的列中,以获取包含比赛举行日期的每一行的数据。。。嗯:)
row.append(“7/4/2014”)
?回答得非常好@用户3808992请将其标记为正确。