Python BeautifulSoup数据抓取和数据库

Python BeautifulSoup数据抓取和数据库,python,sql,sqlite,web-scraping,beautifulsoup,Python,Sql,Sqlite,Web Scraping,Beautifulsoup,我正在使用BeautifulSoup解析一个网站 现在我的问题是:我想把所有这些都写进一个像sqlite这样的数据库中,其中有目标生成的分钟数。我可以从我得到的链接中获得这些信息,但这只有在目标计数不是?-?,的情况下才有可能?,因为没有任何目标 from pprint import pprint import urllib2 from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.live

我正在使用BeautifulSoup解析一个网站

现在我的问题是:我想把所有这些都写进一个像sqlite这样的数据库中,其中有目标生成的分钟数。我可以从我得到的链接中获得这些信息,但这只有在目标计数不是?-?,的情况下才有可能?,因为没有任何目标

from pprint import pprint
import urllib2

from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.livescore.com/soccer/champions-league/'))

data = []
for match in soup.select('table.league-table tr'):
    try:
        team1, team2 = match.find_all('td', class_=['fh', 'fa'])
    except ValueError:  # helps to skip irrelevant rows
        continue

    score = match.find('a', class_='scorelink').text.strip()
    data.append({
        'team1': team1.text.strip(),
        'team2': team2.text.strip(),
        'score': score
    })

pprint(data)

href_tags = soup.find_all('a', {'class':"scorelink"})

links = []

for x in xrange(1, len(href_tags)):
    insert = href_tags[x].get("href");links.append(insert)

print links

首先,如果比赛中的球队没有得分,那么得分有什么意义呢

其思想是迭代具有league table类的每个表中的每一行。对于每一行,获得球队名称和得分。将结果收集到字典列表中:

from pprint import pprint
import urllib2

from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.livescore.com/soccer/champions-league/'))

data = []
for match in soup.select('table.league-table tr'):
    try:
        team1, team2 = match.find_all('td', class_=['fh', 'fa'])
    except ValueError:  # helps to skip irrelevant rows
        continue

    score = match.find('a', class_='scorelink').text.strip()
    data.append({
        'team1': team1.text.strip(),
        'team2': team2.text.strip(),
        'score': score
    })

pprint(data)
印刷品:

[
    {'score': u'? - ?', 'team1': u'Atletico Madrid', 'team2': u'Malmo FF'},
    {'score': u'? - ?', 'team1': u'Olympiakos', 'team2': u'Juventus'},
    {'score': u'? - ?', 'team1': u'Liverpool', 'team2': u'Real Madrid'},
    {'score': u'? - ?', 'team1': u'PFC Ludogorets Razgrad', 'team2': u'Basel'},
    ...
]
请注意,当前它会附加每一场比赛,即使它还没有打过。如果需要收集有分数的匹配项,只需检查分数是否不等于?-?:

这种情况下的输出为:

[{'score': u'2 - 2', 'team1': u'CSKA Moscow', 'team2': u'Manchester City'},
 {'score': u'3 - 0', 'team1': u'Zenit St. Petersburg', 'team2': u'Standard Liege'},
 {'score': u'4 - 0', 'team1': u'APOEL Nicosia', 'team2': u'AaB'},
 {'score': u'3 - 0', 'team1': u'BATE Borisov', 'team2': u'Slovan Bratislava'},
 {'score': u'0 - 1', 'team1': u'Celtic', 'team2': u'Maribor'},
 {'score': u'2 - 0', 'team1': u'FC Porto', 'team2': u'Lille'},
 {'score': u'1 - 0', 'team1': u'Arsenal', 'team2': u'Besiktas'},
 {'score': u'3 - 1', 'team1': u'Athletic Bilbao', 'team2': u'SSC Napoli'},
 {'score': u'4 - 0', 'team1': u'Bayer Leverkusen', 'team2': u'FC Koebenhavn'},
 {'score': u'3 - 0', 'team1': u'Malmo FF', 'team2': u'Salzburg'},
 {'score': u'1 - 0', 'team1': u'PFC Ludogorets Razgrad *', 'team2': u'Steaua Bucuresti'}]
至于写入数据库部分,您可以使用module和ExecuteMy,包括:


当然还有其他事情需要改进或讨论,但我认为这对您来说是一个好的开始。

您的问题是什么?您能举一个例子说明解析器在目标/非目标的情况下返回的结果吗?否则我真的不知道该如何帮助你好!对事情是这样的:我有一方所有的游戏链接。另一方面,我有比赛,谁赢了谁没赢。考虑到这一点,我想比较一下这两种情况,以及当分数不是?我应该下载那个链接,在里面解析目标达成的时间,然后插入数据库。
[{'score': u'2 - 2', 'team1': u'CSKA Moscow', 'team2': u'Manchester City'},
 {'score': u'3 - 0', 'team1': u'Zenit St. Petersburg', 'team2': u'Standard Liege'},
 {'score': u'4 - 0', 'team1': u'APOEL Nicosia', 'team2': u'AaB'},
 {'score': u'3 - 0', 'team1': u'BATE Borisov', 'team2': u'Slovan Bratislava'},
 {'score': u'0 - 1', 'team1': u'Celtic', 'team2': u'Maribor'},
 {'score': u'2 - 0', 'team1': u'FC Porto', 'team2': u'Lille'},
 {'score': u'1 - 0', 'team1': u'Arsenal', 'team2': u'Besiktas'},
 {'score': u'3 - 1', 'team1': u'Athletic Bilbao', 'team2': u'SSC Napoli'},
 {'score': u'4 - 0', 'team1': u'Bayer Leverkusen', 'team2': u'FC Koebenhavn'},
 {'score': u'3 - 0', 'team1': u'Malmo FF', 'team2': u'Salzburg'},
 {'score': u'1 - 0', 'team1': u'PFC Ludogorets Razgrad *', 'team2': u'Steaua Bucuresti'}]
import sqlite3

conn = sqlite3.connect('data.db')
conn.execute("""
    CREATE TABLE IF NOT EXISTS matches (
        id    integer primary key autoincrement not null,
        team1  text,
        team2 text,
        score text
    )""")

cursor = conn.cursor()
cursor.executemany("""
    INSERT INTO 
        matches (team1, team2, score) 
    VALUES 
        (:team1, :team2, :score)""", data)
conn.commit()
conn.close()