Python 使用这些表并将其相应的团队关联在一起_Python_Python 3.x_Html Parsing_Beautifulsoup

Python 使用这些表并将其相应的团队关联在一起

python python-3.x

Python 使用这些表并将其相应的团队关联在一起,python,python-3.x,html-parsing,beautifulsoup,Python,Python 3.x,Html Parsing,Beautifulsoup,我最近得到了雅虎NHL页面的帮助，该页面将以各自的方式打印出球队和他们前面提到的分数。这是我的密码： from bs4 import BeautifulSoup from urllib.request import urlopen url = urlopen("http://sports.yahoo.com/nhl/scoreboard?d=2013-01-19") content = url.read() soup = BeautifulSoup(content) def yahoos

我最近得到了雅虎NHL页面的帮助，该页面将以各自的方式打印出球队和他们前面提到的分数。这是我的密码：

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = urlopen("http://sports.yahoo.com/nhl/scoreboard?d=2013-01-19")

content = url.read()

soup = BeautifulSoup(content)

def yahooscores():
    results = {}

    for table in soup.find_all('table', class_='scores'):
        for row in table.find_all('tr'):
            scores = []
            name = None
            for cell in row.find_all('td', class_='yspscores'):
                link = cell.find('a')
                if link:
                    name = link.text
                elif cell.text.isdigit():
                    scores.append(cell.text)
            if name is not None:
                results[name] = scores

    for name, scores in results.items():
        print ('%s: %s' % (name, ', '.join(scores)) + '.')

yahooscores()

现在，首先：我将这些东西关联到一个函数中，因为我必须不断地更改url，以获得一月中每天的所有值

这里的问题是，虽然我可以很好地打印分数和团队文本，但我正在努力做到这一点：

Ottawa: 1, 1, 2.
Winnipeg: 1, 0, 0.

Pittsburgh: 2, 0, 1
Philadelphia: 0, 1, 0.

看，我的代码没有这样做。我当时正在努力实现这一点，但使这一过程复杂化的是，这些表格都属于同一类“分数”，似乎我找不到它们之间有什么不同

简而言之，将团队正确地联系在一起，并在两者之间留出一个空间进行组织。

问题是，您将每个团队的结果放入一个

dict

，但在

dict

中没有顺序，因此您无法跟踪页面上哪个表中的分数（即哪个游戏）

为了解决这个问题，您可以直接打印结果而不是存储结果，并在外部for循环中添加一个额外的换行符：

def yahooscores():
    results = {}

    for table in soup.find_all('table', class_='scores'):

        for row in table.find_all('tr'):
            scores = []
            name = None
            for cell in row.find_all('td', class_='yspscores'):
                link = cell.find('a')
                if link:
                    name = link.text
                elif cell.text.isdigit():
                    scores.append(cell.text)
            if name is not None:
                print ('%s: %s' % (name, ', '.join(scores)) + '.')

        print ""

yahooscores()

或者，如果要存储分数并在以后显示，也可以存储每场比赛的球队，并使用它们对结果进行分组：

def yahooscores():
    results = {}

    games = []

    for table in soup.find_all('table', class_='scores'):
        teams = []

        for row in table.find_all('tr'):
            scores = []
            name = None
            for cell in row.find_all('td', class_='yspscores'):
                link = cell.find('a')
                if link:
                    name = link.text
                elif cell.text.isdigit():
                    scores.append(cell.text)
            if name is not None:
                results[name] = scores
                teams.append(name)

        games.append(teams)

    for teams in games:
        for name in teams:
            scores = results[name]
            print ('%s: %s' % (name, ', '.join(scores)) + '.')
        print ""

yahooscores()

问题是，你正在将每个队的结果放入一个

dict

，但在

dict

中没有顺序，因此你不知道哪些分数来自页面上的哪个表（即哪场比赛）

为了解决这个问题，您可以直接打印结果而不是存储结果，并在外部for循环中添加一个额外的换行符：

def yahooscores():
    results = {}

    for table in soup.find_all('table', class_='scores'):

        for row in table.find_all('tr'):
            scores = []
            name = None
            for cell in row.find_all('td', class_='yspscores'):
                link = cell.find('a')
                if link:
                    name = link.text
                elif cell.text.isdigit():
                    scores.append(cell.text)
            if name is not None:
                print ('%s: %s' % (name, ', '.join(scores)) + '.')

        print ""

yahooscores()

或者，如果要存储分数并在以后显示，也可以存储每场比赛的球队，并使用它们对结果进行分组：

def yahooscores():
    results = {}

    games = []

    for table in soup.find_all('table', class_='scores'):
        teams = []

        for row in table.find_all('tr'):
            scores = []
            name = None
            for cell in row.find_all('td', class_='yspscores'):
                link = cell.find('a')
                if link:
                    name = link.text
                elif cell.text.isdigit():
                    scores.append(cell.text)
            if name is not None:
                results[name] = scores
                teams.append(name)

        games.append(teams)

    for teams in games:
        for name in teams:
            scores = results[name]
            print ('%s: %s' % (name, ', '.join(scores)) + '.')
        print ""

yahooscores()

问题在于，你将表格视为一个简单的团队列表，而不是一个分数列表，每个分数列表中都有两个团队

解决这一问题的干净方法是改变解析页面的方式，以便在游戏中循环，然后，为每个游戏存储一对名称和分数

但也有一个快速而肮脏的解决方案：如果你让团队井然有序，你可以在事后将他们配对。

dict

没有固有的顺序，但是

orderedict

保留插入顺序。因此，只需将

results={}

更改为

results=collections.orderedict

（虽然你用这条口述做的唯一一件事就是迭代它的

items（）

，但我不知道你为什么想要一本字典。只要做

results=[]

，用

结果替换results[name]=scores
。附加（（name，scores））

，然后迭代

结果，而不是结果.items（）
）
现在，如果你想把它们成对地打印出来……好吧，你可以很容易地从任何一个iterable中生成一个对的迭代器。例如：
def pairs(iterable):
    return zip(*[iter(iterable)]*2)

for (name1, score1), (name2, score2) in pairs(results.items()):
    print ('%s: %s' % (n1, ', '.join(s1)) + '.')
    print ('%s: %s' % (n2, ', '.join(s2)) + '.')
    print

或者，如果你不明白这意味着什么，像这样的黑客也可以：
pair_done = False
for name, scores in results.items():
    print ('%s: %s' % (name, ', '.join(scores)) + '.')
    if pair_done:
        print
    pair_done = not pair_done

…或：
for i, (name, scores) in enumerate(results.items()):
    print ('%s: %s' % (name, ', '.join(scores)) + '.')
    if i % 2:
        print

问题在于，你将表格视为一个简单的团队列表，而不是一个分数列表，每个分数列表中都有两个团队
解决这一问题的干净方法是改变解析页面的方式，以便在游戏中循环，然后，为每个游戏存储一对名称和分数

但也有一个快速而肮脏的解决方案：如果你让团队井然有序，你可以在事后将他们配对。dict
没有固有的顺序，但是orderedict
保留插入顺序。因此，只需将results={}
更改为results=collections.orderedict

（虽然你用这条口述做的唯一一件事就是迭代它的items（）
，但我不知道你为什么想要一本字典。只要做results=[]
，用结果替换results[name]=scores
。附加（（name，scores））
，然后迭代结果，而不是结果.items（）
）
现在，如果你想把它们成对地打印出来……好吧，你可以很容易地从任何一个iterable中生成一个对的迭代器。例如：
def pairs(iterable):
    return zip(*[iter(iterable)]*2)

for (name1, score1), (name2, score2) in pairs(results.items()):
    print ('%s: %s' % (n1, ', '.join(s1)) + '.')
    print ('%s: %s' % (n2, ', '.join(s2)) + '.')
    print

或者，如果你不明白这意味着什么，像这样的黑客也可以：
pair_done = False
for name, scores in results.items():
    print ('%s: %s' % (name, ', '.join(scores)) + '.')
    if pair_done:
        print
    pair_done = not pair_done

…或：
for i, (name, scores) in enumerate(results.items()):
    print ('%s: %s' % (name, ', '.join(scores)) + '.')
    if i % 2:
        print

你能提供一个你得到的样品，让我们看看哪里出了问题吗？是不是你不能把球队的比赛分开？当我运行这个程序时，我得到了你想要的四条线，以及其他一些线（对于其他球队），以任意顺序排列。所以你想要的大概是把渥太华和温尼伯放在一起，与匹兹堡和费城分开，中间有一条线……但我不知道那是什么“东西”。（我可以想象为什么你可能希望匹兹堡和费城在一起，但渥太华和温尼伯在不同的会议、不同的省份、不同的一切……@abarnert你在浏览器中访问过这个页面吗？大概目标是显示每场比赛的球队以及中间的换行符。=）@是的，我是在浏览器中访问的。我认为“没有比赛安排”是这样的。任何球队出现的唯一地方是右边的排名，那里的球队是按分区组织的。@Raksice:啊，我明白了。如果我使用在该页面上找到的“规范URL”，它将在任何浏览器中正确显示。否则，它在Firefox中以最小的格式显示，而在Chrome中以格式显示，而不是实际分数…在Firefox中查看，我可以看到OP可能想要什么。您能提供一个示例，说明您得到了什么，以便我们可以看到出了什么问题吗？是不是你不能把球队的比赛分开？当我运行这个程序时，我得到了你想要的那四条线，以及其他一些线（例如