Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用python在网站上刮表_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

如何使用python在网站上刮表

如何使用python在网站上刮表,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我熟悉python,但以前从未尝试过从网站上抓取数据。我已经查阅了BeautifulSoup的文档,但是我对html了解不够,无法得到我想要的。我正在尝试从这个网站的一个表中检索数据 如果我想列出每部电影的排名、片名和年份,我会怎么做 这就是我所能做到的。不太远,但这是个开始 url='https://en.wikipedia.org/wiki/List_of_highest-grossing_films' resp=request.get(url) if resp.

我熟悉python,但以前从未尝试过从网站上抓取数据。我已经查阅了BeautifulSoup的文档,但是我对html了解不够,无法得到我想要的。我正在尝试从这个网站的一个表中检索数据

如果我想列出每部电影的排名、片名和年份,我会怎么做

这就是我所能做到的。不太远,但这是个开始

    url='https://en.wikipedia.org/wiki/List_of_highest-grossing_films'
    resp=request.get(url)

    if resp.status_code==200:
        soup=BeautifulSoup(resp.content, 'html.parser')
        l=soup.find_all('a')

我使用find_all('a'),因为所有标题都可以点击。但是网页上有很多东西是可以点击的,所以这可能不是最好的选择。但是我想要的其他信息是不可点击的。我不知道该怎么办。

给你,你可以在任何维基百科页面上使用这个。您可以使用whichtable=选项选择该表,因为wikipedia页面可以有多个。这个程序创建了一个很好的csv文件用于数据帧。 :

输出

    Rank   Peak                                          Title   Worldwide gross  Year  Reference(s)
0      1      1                              Avengers: Endgame    $2,797,800,564  2019    [# 1][# 2]
1      2      1                                         Avatar    $2,789,679,794  2009    [# 3][# 4]
2      3      1                                        Titanic    $2,187,463,944  1997    [# 5][# 6]
3      4      3                   Star Wars: The Force Awakens    $2,068,223,624  2015    [# 7][# 8]
4      5      4                         Avengers: Infinity War    $2,048,359,754  2018   [# 9][# 10]
5      6      3                                 Jurassic World    $1,671,713,208  2015  [# 11][# 12]
6      7      7                                  The Lion King    $1,656,405,082  2019   [# 13][# 2]
7      8      3                                   The Avengers    $1,518,812,988  2012  [# 14][# 15]
8      9      4                                      Furious 7    $1,516,045,911  2015  [# 16][# 17]
9     10      5                        Avengers: Age of Ultron    $1,405,403,694  2015  [# 18][# 17]
10    11      9                                  Black Panther    $1,346,913,161  2018  [# 19][# 20]
11    12      3  Harry Potter and the Deathly Hallows – Part 2    $1,341,693,157  2011  [# 21][# 22]
12    13      9                       Star Wars: The Last Jedi    $1,332,539,889  2017  [# 23][# 24]
13    14     12                 Jurassic World: Fallen Kingdom    $1,309,484,461  2018  [# 25][# 10]
14    15      5                                         Frozen   F$1,290,000,000  2013  [# 26][# 27]
15    16     10                           Beauty and the Beast    $1,263,521,126  2017  [# 28][# 29]
16    17     15                                  Incredibles 2    $1,242,805,359  2018  [# 30][# 10]
17    18     11                        The Fate of the Furious  F8$1,238,764,765  2017  [# 31][# 29]
18    19      5                                     Iron Man 3    $1,214,811,252  2013  [# 32][# 33]
19    20     10                                        Minions    $1,159,398,397  2015  [# 34][# 12]
20    21     12                     Captain America: Civil War    $1,153,304,495  2016  [# 35][# 36]
21    22     20                                        Aquaman    $1,148,161,807  2018  [# 37][# 10]
22    23     23                      Spider-Man: Far From Home    $1,131,927,996  2019   [# 38][# 2]
23    24     22                                 Captain Marvel    $1,128,274,794  2019  [# 39][# 40]
24    25      4                 Transformers: Dark of the Moon    $1,123,794,079  2011  [# 41][# 22]
25    26      2  The Lord of the Rings: The Return of the King    $1,120,237,002  2003  [# 42][# 43]
26    27      7                                        Skyfall    $1,108,561,013  2012  [# 44][# 45]
27    28     10                Transformers: Age of Extinction    $1,104,054,072  2014  [# 46][# 47]
28    29      7                          The Dark Knight Rises    $1,084,939,099  2012  [# 48][# 49]
29    30     30                                    Toy Story 4    $1,073,394,593  2019   [# 50][# 2]
30    31   4TS3                                    Toy Story 3    $1,066,969,703  2010  [# 51][# 52]
31    32      3     Pirates of the Caribbean: Dead Man's Chest    $1,066,179,725  2006  [# 53][# 54]
32    33     33                                          Joker    $1,057,193,906  2019  [# 55][# 56]
33    34     20                   Rogue One: A Star Wars Story    $1,056,057,273  2016    [14][# 57]
34    35     34                                        Aladdin    $1,050,693,953  2019   [# 58][# 2]
35    36      6    Pirates of the Caribbean: On Stranger Tides    $1,045,713,802  2011  [# 59][# 52]
36    37     24                                Despicable Me 3    $1,034,799,409  2017  [# 60][# 29]
37    38      1                                  Jurassic Park    $1,029,939,903  1993  [# 61][# 62]
38    39     22                                   Finding Dory    $1,028,570,889  2016  [# 63][# 64]
39    40      2      Star Wars: Episode I – The Phantom Menace    $1,027,044,677  1999   [# 65][# 6]
40    41      5                            Alice in Wonderland    $1,025,467,110  2010  [# 66][# 67]
41    42     24                                       Zootopia    $1,023,784,195  2016  [# 68][# 36]
42    43     14              The Hobbit: An Unexpected Journey    $1,021,103,568  2012  [# 69][# 70]
43    44      4                                The Dark Knight    $1,004,934,033  2008  [# 71][# 72]
44    45    2PS       Harry Potter and the Philosopher's Stone      $975,051,288  2001  [# 73][# 74]
45    46  19DM2                                Despicable Me 2      $970,761,885  2013  [# 75][# 33]
46    47      2                                  The Lion King      $968,483,777  1994  [# 76][# 62]
47    48     30                                The Jungle Book      $966,550,600  2016  [# 77][# 78]
48    49      5       Pirates of the Caribbean: At World's End      $963,420,425  2007  [# 79][# 80]
49    50     40                 Jumanji: Welcome to the Jungle      $962,126,927  2017  [# 81][# 20]

给你,你可以在任何维基百科页面上使用这个。您可以使用whichtable=选项选择该表,因为wikipedia页面可以有多个。这个程序创建了一个很好的csv文件用于数据帧。 :

输出

    Rank   Peak                                          Title   Worldwide gross  Year  Reference(s)
0      1      1                              Avengers: Endgame    $2,797,800,564  2019    [# 1][# 2]
1      2      1                                         Avatar    $2,789,679,794  2009    [# 3][# 4]
2      3      1                                        Titanic    $2,187,463,944  1997    [# 5][# 6]
3      4      3                   Star Wars: The Force Awakens    $2,068,223,624  2015    [# 7][# 8]
4      5      4                         Avengers: Infinity War    $2,048,359,754  2018   [# 9][# 10]
5      6      3                                 Jurassic World    $1,671,713,208  2015  [# 11][# 12]
6      7      7                                  The Lion King    $1,656,405,082  2019   [# 13][# 2]
7      8      3                                   The Avengers    $1,518,812,988  2012  [# 14][# 15]
8      9      4                                      Furious 7    $1,516,045,911  2015  [# 16][# 17]
9     10      5                        Avengers: Age of Ultron    $1,405,403,694  2015  [# 18][# 17]
10    11      9                                  Black Panther    $1,346,913,161  2018  [# 19][# 20]
11    12      3  Harry Potter and the Deathly Hallows – Part 2    $1,341,693,157  2011  [# 21][# 22]
12    13      9                       Star Wars: The Last Jedi    $1,332,539,889  2017  [# 23][# 24]
13    14     12                 Jurassic World: Fallen Kingdom    $1,309,484,461  2018  [# 25][# 10]
14    15      5                                         Frozen   F$1,290,000,000  2013  [# 26][# 27]
15    16     10                           Beauty and the Beast    $1,263,521,126  2017  [# 28][# 29]
16    17     15                                  Incredibles 2    $1,242,805,359  2018  [# 30][# 10]
17    18     11                        The Fate of the Furious  F8$1,238,764,765  2017  [# 31][# 29]
18    19      5                                     Iron Man 3    $1,214,811,252  2013  [# 32][# 33]
19    20     10                                        Minions    $1,159,398,397  2015  [# 34][# 12]
20    21     12                     Captain America: Civil War    $1,153,304,495  2016  [# 35][# 36]
21    22     20                                        Aquaman    $1,148,161,807  2018  [# 37][# 10]
22    23     23                      Spider-Man: Far From Home    $1,131,927,996  2019   [# 38][# 2]
23    24     22                                 Captain Marvel    $1,128,274,794  2019  [# 39][# 40]
24    25      4                 Transformers: Dark of the Moon    $1,123,794,079  2011  [# 41][# 22]
25    26      2  The Lord of the Rings: The Return of the King    $1,120,237,002  2003  [# 42][# 43]
26    27      7                                        Skyfall    $1,108,561,013  2012  [# 44][# 45]
27    28     10                Transformers: Age of Extinction    $1,104,054,072  2014  [# 46][# 47]
28    29      7                          The Dark Knight Rises    $1,084,939,099  2012  [# 48][# 49]
29    30     30                                    Toy Story 4    $1,073,394,593  2019   [# 50][# 2]
30    31   4TS3                                    Toy Story 3    $1,066,969,703  2010  [# 51][# 52]
31    32      3     Pirates of the Caribbean: Dead Man's Chest    $1,066,179,725  2006  [# 53][# 54]
32    33     33                                          Joker    $1,057,193,906  2019  [# 55][# 56]
33    34     20                   Rogue One: A Star Wars Story    $1,056,057,273  2016    [14][# 57]
34    35     34                                        Aladdin    $1,050,693,953  2019   [# 58][# 2]
35    36      6    Pirates of the Caribbean: On Stranger Tides    $1,045,713,802  2011  [# 59][# 52]
36    37     24                                Despicable Me 3    $1,034,799,409  2017  [# 60][# 29]
37    38      1                                  Jurassic Park    $1,029,939,903  1993  [# 61][# 62]
38    39     22                                   Finding Dory    $1,028,570,889  2016  [# 63][# 64]
39    40      2      Star Wars: Episode I – The Phantom Menace    $1,027,044,677  1999   [# 65][# 6]
40    41      5                            Alice in Wonderland    $1,025,467,110  2010  [# 66][# 67]
41    42     24                                       Zootopia    $1,023,784,195  2016  [# 68][# 36]
42    43     14              The Hobbit: An Unexpected Journey    $1,021,103,568  2012  [# 69][# 70]
43    44      4                                The Dark Knight    $1,004,934,033  2008  [# 71][# 72]
44    45    2PS       Harry Potter and the Philosopher's Stone      $975,051,288  2001  [# 73][# 74]
45    46  19DM2                                Despicable Me 2      $970,761,885  2013  [# 75][# 33]
46    47      2                                  The Lion King      $968,483,777  1994  [# 76][# 62]
47    48     30                                The Jungle Book      $966,550,600  2016  [# 77][# 78]
48    49      5       Pirates of the Caribbean: At World's End      $963,420,425  2007  [# 79][# 80]
49    50     40                 Jumanji: Welcome to the Jungle      $962,126,927  2017  [# 81][# 20]

您可以尝试以下方法:

#将HTML解析为字符串
soup=BeautifulSoup(分别为内容“html.parser”)
#抢第一张桌子
table=soup.find_all('table')[0]

此时,您可以开始在表对象上循环,以初始化每列的多个列表或包含每个表行的单个列表,然后从中创建一个数据帧。

您可以尝试以下操作:

#将HTML解析为字符串
soup=BeautifulSoup(分别为内容“html.parser”)
#抢第一张桌子
table=soup.find_all('table')[0]

此时,您可以开始在表对象上循环,以初始化每列的多个列表或包含每个表行的单个列表,然后从中创建一个数据帧。

您最好阅读一些关于该主题的指南或教程。您最好阅读一些关于该主题的指南或教程。