需要有嵌套div和;跨数据帧python

需要有嵌套div和;跨数据帧python,python,pandas,dataframe,beautifulsoup,python-requests,Python,Pandas,Dataframe,Beautifulsoup,Python Requests,我试图将一场体育比赛的结果拼凑到熊猫数据框中,每一行都是一个不同的战士的名字 这是我的密码: import re import requests from bs4 import BeautifulSoup page = requests.get("http://www.bjjcompsystem.com/tournaments/1221/categories/1532871") soup = BeautifulSoup(page.content, 'lxml') body = list(so

我试图将一场体育比赛的结果拼凑到熊猫数据框中,每一行都是一个不同的战士的名字

这是我的密码:

import re
import requests
from bs4 import BeautifulSoup

page = requests.get("http://www.bjjcompsystem.com/tournaments/1221/categories/1532871")
soup = BeautifulSoup(page.content, 'lxml')

body = list(soup.children)[1]
alldivs = list(body.children)[3]
sections = list(alldivs.children)[5]
division = list(sections.children)[1]
div_name = division.get_text().replace('\n','')
bracket = list(sections.children)[3]

import pandas as pd
data = []
div_name = division.get_text().replace('\n','')

bracket = list(sections.children)[3]
for i in bracket:

    bracket_title = [bt.get_text() for bt in bracket.select(".bracket-title")]
    location = [l.get_text() for l in bracket.select(".bracket-match-header__where")]
    time = [t.get_text() for t in bracket.select(".bracket-match-header__when")]
    fighter_rank = [fr.get_text() for fr in bracket.select(".match-card__competitor-n")]
    competitor_desc = [cd.get_text() for cd in bracket.select(".match-card__competitor-description")]
    loser_name = [ln.get_text() for ln in bracket.select(".match-competitor--loser")]

    data.append((div_name,bracket_title,location,time,fighter_rank,competitor_desc,loser_name))

df = pd.DataFrame(pd.DataFrame(data, columns=['Division','Bracket','Location','Time','Rank','Fighter','Loser']))
df
但是,这会导致每个单元格逐行包含一个列表。我将其修改为以下代码:

import pandas as pd
data = []
div_name = division.get_text().replace('\n','')

bracket2 = soup.find_all('div', class_='tournament-category__brackets')

for i in bracket2:

    bracketNo = i.find_all('div', class_='bracket-title')

    section = i.find_all('div', class_='tournament-category__bracket tournament-category__bracket-15')

    for a in section:
        cats = a.find_all('div', class_='tournament-category__match')

        for j in cats:
            fight = j.find_all('div', class_='bracket-match-header') 
            for k in fight:
                where = k.find('div', class_='bracket-match-header__where').get_text().replace('\n',' ')
                when = k.find('div', class_='bracket-match-header__when').get_text().replace('\n',' ')

            match = j.find_all('div', class_='match-card match-card--yellow')

            for b in match:

                rank = b.find_all('span', class_='match-card__competitor-n') 
                fighter = b.find_all('div', class_='match-card__competitor-name') 
                gym = b.find_all('div', class_='match-card__club-name') 
                loser = b.find_all('span', class_='match-competitor--loser') 

                data.append((div_name,bracketNo,when,where,rank,fighter,gym,loser,))

df1 = pd.DataFrame(pd.DataFrame(data, columns=['Division','Bracket','Time','Location','Rank','Fighter','Gym','Loser']))
df1
只有1个分区,因此每行的分区都是相同的。共有5个括号类别(1/4、2/4、3/4、4/4、决赛)。我想要每个括号对应的时间/位置。每个级别、战士和健身房的每个牢房都有两个,我希望这是每排一个。数据帧中的部分具有不同的长度,因此导致了一些问题

理想情况下,我希望数据帧如下所示:

Division    Bracket Time    Location    Rank    Fighter Gym Loser
Master 1 Male BLACK Middle  Bracket 1/4 Wed 08/21 at 10:08 AM   FIGHT 1: Mat 5  16  Jeffery Bynum Hammon Caique Jiu-Jitsu   None
Master 1 Male BLACK Middle  Bracket 1/4 Wed 08/21 at 10:08 AM   FIGHT 1: Mat 5  53  Fábio Junior Batista da Evolve MMA  Fábio Junior Batista da Evolve MMA
Master 1 Male BLACK Middle  Bracket 2/4 Wed 08/21 at 10:07 AM   FIGHT 1: Mat 6  14  André Felipe Maciel Fre Carlson Gracie  None
Master 1 Male BLACK Middle  Bracket 2/4 Wed 08/21 at 10:07 AM   FIGHT 1: Mat 6  50  Jerardo Linares Cleber Jiu Jitsu    Jerardo Linares Cleber Jiu Jitsu
任何建议都会非常有用。我试图创建嵌套循环并遵循结构,但HTML树对我来说相当复杂。df中最少的格式是理想的,因为我稍后将在多个页面上循环此操作。提前谢谢

编辑:下一步-在多个页面上循环此程序:

pages = [ #sample, no brackets
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533466', #example of category__bracket-1
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533387', #example of category__bracket-3
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533372', #example of category__bracket-7
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533022', #example of category__bracket-15
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532847',
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532871',  #example of category__bracket-15 plus finals
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532889', #example of bracket with two losers in a match, so throws an error in fight 32 on fighter a name
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532856', #example of no winner on fight 11 so throws error on fight be name
]
首先,我定义了多个链接。这是411个不同部门的子集

results = pd.DataFrame()
for page in pages:
    response = requests.get(page)
    soup = BeautifulSoup(response.text, 'html.parser')

    division = soup.find('span', {'class':'category-title__label category-title__age-division'}).text.strip()
    label = soup.find('i', {'class':'fa fa-mars'}).parent.text.strip()
    belt = soup.find('i', {'class':'fa fa-belt'}).parent.text.strip()
    weight = soup.find('i', {'class':'fa fa-weight'}).parent.text.strip()

    # PARSE BRACKETS
    brackets = soup.find_all(['div', {'class':'tournament-category__bracket tournament-category__bracket-15'},
                              'div', {'class':'tournament-category__bracket tournament-category__bracket-1'},
                             'div', {'class':'tournament-category__bracket tournament-category__bracket-3'},
                             'div', {'class':'tournament-category__bracket tournament-category__bracket-7'}])
    #results = pd.DataFrame()
    for bracket in brackets:
...etc

有没有一种方法可以写入编程中,如何考虑不同的大小划分?顶部的示例使用4个括号+决赛和15个匹配括号。还有一个分区有1个匹配项,或者3个、7个,或者只有15个,而不是多个括号。在没有按大小分割所有链接并重新编写程序的情况下,我想知道是否有一个if/then语句可以添加或try/except?

这是一个棘手的问题,因为有些属性包括比赛的输家,但由于某些原因,其他属性没有。因此,我们必须找到一种方法来填补那些缺失的空值

但尽管如此,我想我还是把它填对了。只需迭代每个括号的每个匹配项,然后将它们全部附加到一个表中。为了填写缺少的
“失败者”
列,我按战斗编号排序,基本上看了缺少“失败者”的行,并查看了在以后的比赛中哪个战斗机参加了战斗。很明显,如果拳击手稍后还有一场比赛,那么他的对手就是失败者

代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import natsort as ns


pages = [ #sample, no brackets
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533466', #example of category__bracket-1
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533387', #example of category__bracket-3
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533372', #example of category__bracket-7
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533022', #example of category__bracket-15
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532847',
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532871',  #example of category__bracket-15 plus finals
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532889', #example of bracket with two losers in a match, so throws an error in fight 32 on fighter a name
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532856', #example of no winner on fight 11 so throws error on fight be name
]

for url in pages:
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')

        division = soup.find('span', {'class':'category-title__label category-title__age-division'}).text.strip()
        label = soup.find('i', {'class':'fa fa-mars'}).parent.text.strip()
        belt = soup.find('i', {'class':'fa fa-belt'}).parent.text.strip()
        weight = soup.find('i', {'class':'fa fa-weight'}).parent.text.strip()


        # PARSE BRACKETS
        #brackets = soup.find_all('div', {'class':'tournament-category__bracket tournament-category__bracket-15'})
        brackets = soup.select('div[class*="tournament-category__bracket tournament-category__bracket-"]')

        results = pd.DataFrame()
        for bracket in brackets:
            try:
                bracketTitle = bracket.find_previous_sibling('div').text
            except:
                bracketTitle = 'Bracket 1/1'

            rows = bracket.find_all('div', {'class':'row'})
            for row in rows:
                matches = row.find_all('div', {'class':'tournament-category__match'})
                for match in matches:
                    #match = matches[0]#delete
                    bye = False
                    try:
                        match.find("div", {"class": "match-card__bye"}).text
                        where = match.find("div", {"class": "match-card__bye"}).text
                        when = match.find("div", {"class": "match-card__bye"}).text
                        loser = match.find("div", {"class": "match-card__bye"}).text
                        fighter_b_name = match.find("div", {"class": "match-card__bye"}).text
                        fighter_b_rank = match.find("div", {"class": "match-card__bye"}).text
                        fighter_b_club = match.find("div", {"class": "match-card__bye"}).text
                        bye = True

                    except:
                        where = match.find('div',{'class':'bracket-match-header__where'}).text
                        when = match.find('div',{'class':'bracket-match-header__when'}).text

                    fighter_a_desc = match.find_all('div',{'class':'match-card__competitor'})[0]
                    try:
                        fighter_a_name = fighter_a_desc.find('div', {'class':'match-card__competitor-name'}).text
                    except:
                        fighter_a_name = 'UNKNOWN'
                    try:
                        fighter_a_rank = fighter_a_desc.find('span', {'class':'match-card__competitor-n'}).text
                    except:
                        fighter_a_rank = 'N/A'
                    try:
                        fighter_a_club = fighter_a_desc.find('div', {'class':'match-card__club-name'}).text
                    except:
                        fighter_a_club = 'N/A'

                    cols = ['Bracket Title','Divison','Label','Belt','Weight','Where','When','Rank','Fighter','Opponent', 'Opponent Rank' ,'Gym','Loser']

                    if bye == False:
                        fighter_b_desc = match.find_all('div',{'class':'match-card__competitor'})[1]
                        try:
                            fighter_b_name = fighter_b_desc.find('div', {'class':'match-card__competitor-name'}).text
                        except:
                            fighter_b_name = 'UNKNOWN'
                        try:
                            fighter_b_rank = fighter_b_desc.find('span', {'class':'match-card__competitor-n'}).text
                        except:
                            fighter_b_rank = 'N/A'
                        try:
                            fighter_b_club = fighter_b_desc.find('div', {'class':'match-card__club-name'}).text
                        except:
                            fighter_b_club = 'N/A'

                        try:
                            loser = match.find('span', {'class':'match-card__competitor-description match-competitor--loser'}).find('div', {'class':'match-card__competitor-name'}).text
                        except:
                            loser = None
                            #print ('Loser could not be idenetified by html class')
                        temp_df_b = pd.DataFrame([[bracketTitle,division, label, belt, weight, where, when, fighter_b_rank, fighter_b_name, fighter_a_name, fighter_a_rank, fighter_b_club ,loser]], columns=cols)

                    temp_df = pd.DataFrame([[bracketTitle,division, label, belt, weight, where, when, fighter_a_rank, fighter_a_name, fighter_b_name, fighter_b_rank, fighter_a_club ,loser]], columns=cols)

                    temp_df = temp_df.append(temp_df_b, sort=True)
                    results = results.append(temp_df, sort=True).reset_index(drop=True)


        # IDENTIFY LOSERS THAT WHERE NOT FOUND BY HTML ATTRIBUTES
        results['Fight Number'] = results['Where'].str.split('FIGHT ', expand=True)[1].str.split(':', expand=True)[0].fillna(0)
        results['Fight Number'] = pd.Categorical(results['Fight Number'], ordered=True, categories= ns.natsorted(results['Fight Number'].unique()))
        results = results.sort_values('Fight Number')  
        results = results.drop_duplicates().reset_index(drop=True)     

        for idx, row in results.iterrows():
            if row['Loser'] == None:
                idx_save = idx
                check = idx + 1
                fighter_check_name = row['Fighter']
                if fighter_check_name in list(results.loc[check:, 'Fighter']):
                    results.at[idx_save,'Loser'] = row['Opponent']
                else:
                    results.at[idx_save,'Loser'] = row['Fighter']

        print ('Processed url: %s' %url)
    except:
        print ('Error accessing url: %s' %url)
输出:我只是显示前25行。总共116个

print (results.head(25).to_string())
     Belt Bracket Title   Divison                             Fighter                               Gym Label                              Loser                           Opponent Opponent Rank Rank  Weight                   When           Where Fight Number
0   BLACK   Bracket 2/4  Master 1                Marcelo França Mafra                          CheckMat  Male                                BYE                                BYE           BYE    4  Middle                    BYE             BYE            0
1   BLACK   Bracket 4/4  Master 1            Dealonzio Jerome Jackson                  Team Lloyd Irvin  Male                                BYE                                BYE           BYE    5  Middle                    BYE             BYE            0
2   BLACK   Bracket 2/4  Master 1                  Oliver Leys Geddes                 Gracie Elite Team  Male                                BYE                                BYE           BYE    6  Middle                    BYE             BYE            0
3   BLACK   Bracket 1/4  Master 1         Gabriel Procópio da Fonseca                Brazilian Top Team  Male                                BYE                                BYE           BYE    9  Middle                    BYE             BYE            0
4   BLACK   Bracket 2/4  Master 1      Igor Mocaiber Peralva de Mello       Cicero Costha Internacional  Male                                BYE                                BYE           BYE   10  Middle                    BYE             BYE            0
5   BLACK   Bracket 1/4  Master 1               Sandro Gabriel Vieira                    Cantagalo Team  Male                                BYE                                BYE           BYE    1  Middle                    BYE             BYE            0
6   BLACK   Bracket 4/4  Master 1  Paulo Cesar Schauffler de Oliveira                 Gracie Elite Team  Male                                BYE                                BYE           BYE    8  Middle                    BYE             BYE            0
7   BLACK   Bracket 3/4  Master 1                 Paulo César Ledesma                    Atos Jiu-Jitsu  Male                                BYE                                BYE           BYE    7  Middle                    BYE             BYE            0
8   BLACK   Bracket 3/4  Master 1       Vitor Henrique Silva Oliveira                           GF Team  Male                                BYE                                BYE           BYE    2  Middle                    BYE             BYE            0
9   BLACK   Bracket 4/4  Master 1                 Clark Rouson Gracie                 Gracie Allegiance  Male                                BYE                                BYE           BYE    3  Middle                    BYE             BYE            0
10  BLACK   Bracket 4/4  Master 1              Phillip V. Fitzpatrick                          CheckMat  Male                Jonathan M. Perrine                Jonathan M. Perrine            29   45  Middle  Wed 08/21 at 10:06 AM  FIGHT 1: Mat 8            1
11  BLACK   Bracket 2/4  Master 1          André Felipe Maciel Freire                   Carlson Gracie   Male                    Jerardo Linares                    Jerardo Linares            50   14  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 6            1
12  BLACK   Bracket 2/4  Master 1                     Jerardo Linares                  Cleber Jiu Jitsu  Male                    Jerardo Linares         André Felipe Maciel Freire            14   50  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 6            1
13  BLACK   Bracket 1/4  Master 1        Fábio Junior Batista da Mata                        Evolve MMA  Male       Fábio Junior Batista da Mata              Jeffery Bynum Hammond            16   53  Middle  Wed 08/21 at 10:08 AM  FIGHT 1: Mat 5            1
14  BLACK   Bracket 4/4  Master 1                 Jonathan M. Perrine                    Gracie Humaita  Male                Jonathan M. Perrine             Phillip V. Fitzpatrick            45   29  Middle  Wed 08/21 at 10:06 AM  FIGHT 1: Mat 8            1
15  BLACK   Bracket 1/4  Master 1               Jeffery Bynum Hammond                  Caique Jiu-Jitsu  Male       Fábio Junior Batista da Mata       Fábio Junior Batista da Mata            53   16  Middle  Wed 08/21 at 10:08 AM  FIGHT 1: Mat 5            1
16  BLACK   Bracket 3/4  Master 1                      David Benzaken                          Teampact  Male              Evan Franklin Barrett              Evan Franklin Barrett            54   15  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 7            1
17  BLACK   Bracket 3/4  Master 1               Evan Franklin Barrett           Zenith BJJ - Las Vegas   Male              Evan Franklin Barrett                     David Benzaken            15   54  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 7            1
18  BLACK   Bracket 2/4  Master 1                     Nathan S Santos           Zenith BJJ - Las Vegas   Male                    Nathan S Santos              Jose A. Llanas-Campos            30   46  Middle  Wed 08/21 at 10:16 AM  FIGHT 2: Mat 6            2
19  BLACK   Bracket 3/4  Master 1                       Javier Arroyo               Team Shawn Hammonds  Male                      Javier Arroyo        Kaisar Adilevich Saulebayev            43   27  Middle  Wed 08/21 at 10:18 AM  FIGHT 2: Mat 7            2
20  BLACK   Bracket 4/4  Master 1              Manuel Ray Gonzales II                      Ralph Gracie  Male                Steven J. Patterson                Steven J. Patterson            13   49  Middle  Wed 08/21 at 10:10 AM  FIGHT 2: Mat 8            2
21  BLACK   Bracket 2/4  Master 1               Jose A. Llanas-Campos                 Ribeiro Jiu-Jitsu  Male                    Nathan S Santos                    Nathan S Santos            46   30  Middle  Wed 08/21 at 10:16 AM  FIGHT 2: Mat 6            2
22  BLACK   Bracket 4/4  Master 1                 Steven J. Patterson                         Brasa CTA  Male                Steven J. Patterson             Manuel Ray Gonzales II            49   13  Middle  Wed 08/21 at 10:10 AM  FIGHT 2: Mat 8            2
23  BLACK   Bracket 3/4  Master 1         Kaisar Adilevich Saulebayev  Charles Gracie Jiu-Jitsu Academy  Male                      Javier Arroyo                      Javier Arroyo            27   43  Middle  Wed 08/21 at 10:18 AM  FIGHT 2: Mat 7            2
24  BLACK   Bracket 1/4  Master 1                  Matthew Romino Fox                  Team Lloyd Irvin  Male  Thiago Alves Cavalcante Rodrigues  Thiago Alves Cavalcante Rodrigues            33   48  Middle  Wed 08/21 at 10:15 AM  FIGHT 2: Mat 5            2

这很棘手,因为有些属性包括比赛的失败者,但由于某些原因,其他属性没有。因此,我们必须找到一种方法来填补那些缺失的空值

但尽管如此,我想我还是把它填对了。只需迭代每个括号的每个匹配项,然后将它们全部附加到一个表中。为了填写缺少的
“失败者”
列,我按战斗编号排序,基本上看了缺少“失败者”的行,并查看了在以后的比赛中哪个战斗机参加了战斗。很明显,如果拳击手稍后还有一场比赛,那么他的对手就是失败者

代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import natsort as ns


pages = [ #sample, no brackets
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533466', #example of category__bracket-1
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533387', #example of category__bracket-3
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533372', #example of category__bracket-7
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1533022', #example of category__bracket-15
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532847',
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532871',  #example of category__bracket-15 plus finals
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532889', #example of bracket with two losers in a match, so throws an error in fight 32 on fighter a name
    'http://www.bjjcompsystem.com/tournaments/1221/categories/1532856', #example of no winner on fight 11 so throws error on fight be name
]

for url in pages:
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')

        division = soup.find('span', {'class':'category-title__label category-title__age-division'}).text.strip()
        label = soup.find('i', {'class':'fa fa-mars'}).parent.text.strip()
        belt = soup.find('i', {'class':'fa fa-belt'}).parent.text.strip()
        weight = soup.find('i', {'class':'fa fa-weight'}).parent.text.strip()


        # PARSE BRACKETS
        #brackets = soup.find_all('div', {'class':'tournament-category__bracket tournament-category__bracket-15'})
        brackets = soup.select('div[class*="tournament-category__bracket tournament-category__bracket-"]')

        results = pd.DataFrame()
        for bracket in brackets:
            try:
                bracketTitle = bracket.find_previous_sibling('div').text
            except:
                bracketTitle = 'Bracket 1/1'

            rows = bracket.find_all('div', {'class':'row'})
            for row in rows:
                matches = row.find_all('div', {'class':'tournament-category__match'})
                for match in matches:
                    #match = matches[0]#delete
                    bye = False
                    try:
                        match.find("div", {"class": "match-card__bye"}).text
                        where = match.find("div", {"class": "match-card__bye"}).text
                        when = match.find("div", {"class": "match-card__bye"}).text
                        loser = match.find("div", {"class": "match-card__bye"}).text
                        fighter_b_name = match.find("div", {"class": "match-card__bye"}).text
                        fighter_b_rank = match.find("div", {"class": "match-card__bye"}).text
                        fighter_b_club = match.find("div", {"class": "match-card__bye"}).text
                        bye = True

                    except:
                        where = match.find('div',{'class':'bracket-match-header__where'}).text
                        when = match.find('div',{'class':'bracket-match-header__when'}).text

                    fighter_a_desc = match.find_all('div',{'class':'match-card__competitor'})[0]
                    try:
                        fighter_a_name = fighter_a_desc.find('div', {'class':'match-card__competitor-name'}).text
                    except:
                        fighter_a_name = 'UNKNOWN'
                    try:
                        fighter_a_rank = fighter_a_desc.find('span', {'class':'match-card__competitor-n'}).text
                    except:
                        fighter_a_rank = 'N/A'
                    try:
                        fighter_a_club = fighter_a_desc.find('div', {'class':'match-card__club-name'}).text
                    except:
                        fighter_a_club = 'N/A'

                    cols = ['Bracket Title','Divison','Label','Belt','Weight','Where','When','Rank','Fighter','Opponent', 'Opponent Rank' ,'Gym','Loser']

                    if bye == False:
                        fighter_b_desc = match.find_all('div',{'class':'match-card__competitor'})[1]
                        try:
                            fighter_b_name = fighter_b_desc.find('div', {'class':'match-card__competitor-name'}).text
                        except:
                            fighter_b_name = 'UNKNOWN'
                        try:
                            fighter_b_rank = fighter_b_desc.find('span', {'class':'match-card__competitor-n'}).text
                        except:
                            fighter_b_rank = 'N/A'
                        try:
                            fighter_b_club = fighter_b_desc.find('div', {'class':'match-card__club-name'}).text
                        except:
                            fighter_b_club = 'N/A'

                        try:
                            loser = match.find('span', {'class':'match-card__competitor-description match-competitor--loser'}).find('div', {'class':'match-card__competitor-name'}).text
                        except:
                            loser = None
                            #print ('Loser could not be idenetified by html class')
                        temp_df_b = pd.DataFrame([[bracketTitle,division, label, belt, weight, where, when, fighter_b_rank, fighter_b_name, fighter_a_name, fighter_a_rank, fighter_b_club ,loser]], columns=cols)

                    temp_df = pd.DataFrame([[bracketTitle,division, label, belt, weight, where, when, fighter_a_rank, fighter_a_name, fighter_b_name, fighter_b_rank, fighter_a_club ,loser]], columns=cols)

                    temp_df = temp_df.append(temp_df_b, sort=True)
                    results = results.append(temp_df, sort=True).reset_index(drop=True)


        # IDENTIFY LOSERS THAT WHERE NOT FOUND BY HTML ATTRIBUTES
        results['Fight Number'] = results['Where'].str.split('FIGHT ', expand=True)[1].str.split(':', expand=True)[0].fillna(0)
        results['Fight Number'] = pd.Categorical(results['Fight Number'], ordered=True, categories= ns.natsorted(results['Fight Number'].unique()))
        results = results.sort_values('Fight Number')  
        results = results.drop_duplicates().reset_index(drop=True)     

        for idx, row in results.iterrows():
            if row['Loser'] == None:
                idx_save = idx
                check = idx + 1
                fighter_check_name = row['Fighter']
                if fighter_check_name in list(results.loc[check:, 'Fighter']):
                    results.at[idx_save,'Loser'] = row['Opponent']
                else:
                    results.at[idx_save,'Loser'] = row['Fighter']

        print ('Processed url: %s' %url)
    except:
        print ('Error accessing url: %s' %url)
输出:我只是显示前25行。总共116个

print (results.head(25).to_string())
     Belt Bracket Title   Divison                             Fighter                               Gym Label                              Loser                           Opponent Opponent Rank Rank  Weight                   When           Where Fight Number
0   BLACK   Bracket 2/4  Master 1                Marcelo França Mafra                          CheckMat  Male                                BYE                                BYE           BYE    4  Middle                    BYE             BYE            0
1   BLACK   Bracket 4/4  Master 1            Dealonzio Jerome Jackson                  Team Lloyd Irvin  Male                                BYE                                BYE           BYE    5  Middle                    BYE             BYE            0
2   BLACK   Bracket 2/4  Master 1                  Oliver Leys Geddes                 Gracie Elite Team  Male                                BYE                                BYE           BYE    6  Middle                    BYE             BYE            0
3   BLACK   Bracket 1/4  Master 1         Gabriel Procópio da Fonseca                Brazilian Top Team  Male                                BYE                                BYE           BYE    9  Middle                    BYE             BYE            0
4   BLACK   Bracket 2/4  Master 1      Igor Mocaiber Peralva de Mello       Cicero Costha Internacional  Male                                BYE                                BYE           BYE   10  Middle                    BYE             BYE            0
5   BLACK   Bracket 1/4  Master 1               Sandro Gabriel Vieira                    Cantagalo Team  Male                                BYE                                BYE           BYE    1  Middle                    BYE             BYE            0
6   BLACK   Bracket 4/4  Master 1  Paulo Cesar Schauffler de Oliveira                 Gracie Elite Team  Male                                BYE                                BYE           BYE    8  Middle                    BYE             BYE            0
7   BLACK   Bracket 3/4  Master 1                 Paulo César Ledesma                    Atos Jiu-Jitsu  Male                                BYE                                BYE           BYE    7  Middle                    BYE             BYE            0
8   BLACK   Bracket 3/4  Master 1       Vitor Henrique Silva Oliveira                           GF Team  Male                                BYE                                BYE           BYE    2  Middle                    BYE             BYE            0
9   BLACK   Bracket 4/4  Master 1                 Clark Rouson Gracie                 Gracie Allegiance  Male                                BYE                                BYE           BYE    3  Middle                    BYE             BYE            0
10  BLACK   Bracket 4/4  Master 1              Phillip V. Fitzpatrick                          CheckMat  Male                Jonathan M. Perrine                Jonathan M. Perrine            29   45  Middle  Wed 08/21 at 10:06 AM  FIGHT 1: Mat 8            1
11  BLACK   Bracket 2/4  Master 1          André Felipe Maciel Freire                   Carlson Gracie   Male                    Jerardo Linares                    Jerardo Linares            50   14  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 6            1
12  BLACK   Bracket 2/4  Master 1                     Jerardo Linares                  Cleber Jiu Jitsu  Male                    Jerardo Linares         André Felipe Maciel Freire            14   50  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 6            1
13  BLACK   Bracket 1/4  Master 1        Fábio Junior Batista da Mata                        Evolve MMA  Male       Fábio Junior Batista da Mata              Jeffery Bynum Hammond            16   53  Middle  Wed 08/21 at 10:08 AM  FIGHT 1: Mat 5            1
14  BLACK   Bracket 4/4  Master 1                 Jonathan M. Perrine                    Gracie Humaita  Male                Jonathan M. Perrine             Phillip V. Fitzpatrick            45   29  Middle  Wed 08/21 at 10:06 AM  FIGHT 1: Mat 8            1
15  BLACK   Bracket 1/4  Master 1               Jeffery Bynum Hammond                  Caique Jiu-Jitsu  Male       Fábio Junior Batista da Mata       Fábio Junior Batista da Mata            53   16  Middle  Wed 08/21 at 10:08 AM  FIGHT 1: Mat 5            1
16  BLACK   Bracket 3/4  Master 1                      David Benzaken                          Teampact  Male              Evan Franklin Barrett              Evan Franklin Barrett            54   15  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 7            1
17  BLACK   Bracket 3/4  Master 1               Evan Franklin Barrett           Zenith BJJ - Las Vegas   Male              Evan Franklin Barrett                     David Benzaken            15   54  Middle  Wed 08/21 at 10:07 AM  FIGHT 1: Mat 7            1
18  BLACK   Bracket 2/4  Master 1                     Nathan S Santos           Zenith BJJ - Las Vegas   Male                    Nathan S Santos              Jose A. Llanas-Campos            30   46  Middle  Wed 08/21 at 10:16 AM  FIGHT 2: Mat 6            2
19  BLACK   Bracket 3/4  Master 1                       Javier Arroyo               Team Shawn Hammonds  Male                      Javier Arroyo        Kaisar Adilevich Saulebayev            43   27  Middle  Wed 08/21 at 10:18 AM  FIGHT 2: Mat 7            2
20  BLACK   Bracket 4/4  Master 1              Manuel Ray Gonzales II                      Ralph Gracie  Male                Steven J. Patterson                Steven J. Patterson            13   49  Middle  Wed 08/21 at 10:10 AM  FIGHT 2: Mat 8            2
21  BLACK   Bracket 2/4  Master 1               Jose A. Llanas-Campos                 Ribeiro Jiu-Jitsu  Male                    Nathan S Santos                    Nathan S Santos            46   30  Middle  Wed 08/21 at 10:16 AM  FIGHT 2: Mat 6            2
22  BLACK   Bracket 4/4  Master 1                 Steven J. Patterson                         Brasa CTA  Male                Steven J. Patterson             Manuel Ray Gonzales II            49   13  Middle  Wed 08/21 at 10:10 AM  FIGHT 2: Mat 8            2
23  BLACK   Bracket 3/4  Master 1         Kaisar Adilevich Saulebayev  Charles Gracie Jiu-Jitsu Academy  Male                      Javier Arroyo                      Javier Arroyo            27   43  Middle  Wed 08/21 at 10:18 AM  FIGHT 2: Mat 7            2
24  BLACK   Bracket 1/4  Master 1                  Matthew Romino Fox                  Team Lloyd Irvin  Male  Thiago Alves Cavalcante Rodrigues  Thiago Alves Cavalcante Rodrigues            33   48  Middle  Wed 08/21 at 10:15 AM  FIGHT 2: Mat 5            2

考虑使用像A这样的东西,你可以共享期望的输出并保持间隔。另外,您是否只需要括号信息(而不是决赛)以及拜行中发生了什么?@QHarr我没有尝试使用降价表生成器。我将检查这个以备将来学习。是的,我确实试着处理“是”,但有问题。下面的答案能够将这一点加进去。非常感谢。考虑使用像A这样的东西,你可以共享期望的输出并保持间隔。另外,您是否只需要括号信息(而不是决赛)以及拜行中发生了什么?@QHarr我没有尝试使用降价表生成器。我将检查这个以备将来学习。是的,我确实试着处理“是”,但有问题。下面的答案能够将这一点加进去。非常感谢。有趣的方法+@chitown88哇这个答案太棒了。我要研究一下这个。我很高兴你能和那些得到了再见的战士打交道。我试过了,但有更多的问题。我也喜欢如何处理失败者类别。您嵌套战斗机描述及其元素的方式也很有意义。非常感谢您的详尽回答!我非常感激你,很高兴它成功了。如果您有任何问题或需要我解释任何事情,请仔细阅读并告诉我。@chitown88嗨,再次感谢您的帮助!我有一个后续问题-我现在正试图在411个分区上循环这个程序。有些分区大小不同。我们最初的示例使用了4个括号或15场比赛+决赛。然而,有些赛区只有1场、3场、7场或15场比赛。我已经在不同的分区链接以及如何在多个页面上循环您的程序的顶部添加了代码。有没有办法为括号变量指定不同的类别?(即第1类或第3类或第7类或第15类)如果有决赛,运行代码,如果没有,则通过/继续?是的,我们可以这样做。当我有机会坐下来的时候,我会写一个回应有趣的方法+@chitown88哇,这个答案太棒了。我要研究一下这个。我很高兴你能和那些得到了再见的战士打交道。我试过了,但有更多的问题。我也喜欢如何处理失败者类别。您嵌套战斗机描述及其元素的方式也很有意义。非常感谢您的详尽回答!我非常感激你,很高兴它成功了。如果您有任何问题或需要我解释任何事情,请仔细阅读并告诉我。@chitown88嗨,再次感谢您的帮助!我有一个后续问题-我现在正试图在411个分区上循环这个程序。有些分区大小不同。我们最初的示例使用了4个括号或15场比赛+决赛。然而,有些赛区只有1场、3场、7场或15场比赛。我已经在不同的分区链接以及如何在多个页面上循环您的程序的顶部添加了代码。我们有没有办法指定不同的