尝试使用BeautifulSoup Python模块从表数据中提取单个元素

尝试使用BeautifulSoup Python模块从表数据中提取单个元素,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我是Python新手,目前正在使用BeautifulSoup和Python来尝试提取一些表数据。我无法从td中获取单个元素。到目前为止,我得到的是: from bs4 import BeautifulSoup import requests source = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/').text

我是Python新手,目前正在使用BeautifulSoup和Python来尝试提取一些表数据。我无法从td中获取单个元素。到目前为止,我得到的是:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/').text

soup = BeautifulSoup(source, 'lxml')

td = soup.find_all('td', {'class': 'text-center'})

print(td)
这确实显示了我想要提取的所有td,但我无法找出如何从td中提取每个元素

非常感谢您的帮助,非常感谢。

试试以下方法:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/').text

soup = BeautifulSoup(source, 'lxml')

td = soup.find_all('td', {'class': 'text-center'})

print(*[text.get_text(strip=True) + '\n' for text in td])
印刷品:

S10
 NA
 14
 35.7%
 0.91
 1744
 -48
 33:19
 11.2
 12.4
 5.5
 7.0
 50.0
 64.3
 2.71
 54.2
 1.00
 57.1
 1.14
等等……

试试这个:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/').text

soup = BeautifulSoup(source, 'lxml')

td = soup.find_all('td', {'class': 'text-center'})

print(*[text.get_text(strip=True) + '\n' for text in td])
印刷品:

S10
 NA
 14
 35.7%
 0.91
 1744
 -48
 33:19
 11.2
 12.4
 5.5
 7.0
 50.0
 64.3
 2.71
 54.2
 1.00
 57.1
 1.14

依此类推……

以下脚本提取数据并将数据保存到csv文件中

import requests
from bs4 import BeautifulSoup
import pandas as pd

res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/')

soup = BeautifulSoup(res.text, 'html.parser')

table = soup.find("table", class_="table_list playerslist tablesaw trhover")

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

data = []

table.find("thead").extract()

for tr in table.find_all("tr"):
    data.append([td.get_text(strip=True) for td in tr.find_all("td")])

df = pd.DataFrame(data, columns=columns)

df.to_csv("data.csv", index=False)
输出:

               Name Season Region Games Win rate   K:D   GPM   GDM Game duration Kills / game Deaths / game Towers killed Towers lost   FB%   FT% DRAPG  DRA% HERPG  HER% DRA@15 TD@15  GD@15 NASHPG NASH%   CSM   DPM  WPM  VWPM  WCPM
0       100 Thieves    S10     NA    14    35.7%  0.91  1744   -48         33:19         11.2          12.4           5.5         7.0  50.0  64.3  2.71  54.2  1.00  57.1   1.14   0.4   -378   0.64  42.9  33.2  1937  3.0  1.19  1.31
1               CLG    S10     NA    14    35.7%  0.81  1705  -120         35:25         10.6          13.2           4.9         7.9  28.6  28.6  1.93  31.5  0.57  28.6   0.64  -0.6  -1297   0.57  30.4  32.6  1826  3.2  1.17  1.37
2            Cloud9    S10     NA    14    78.6%  1.91  1922   302         28:52         15.0           7.9           8.3         3.1  64.3  64.3  3.07  72.5  1.43  71.4   1.29   0.7   2410   1.00  78.6  33.3  1921  3.0  1.10  1.26
3          Dignitas    S10     NA    14    28.6%  0.86  1663  -147         32:44          8.9          10.4           3.9         8.1  42.9  35.7  2.14  41.7  0.57  28.6   0.79  -0.7   -796   0.36  25.0  32.5  1517  3.1  1.28  1.23
4     Evil Geniuses    S10     NA    14    50.0%  0.85  1738    -0         34:09         11.1          13.1           6.5         6.0  64.3  57.1  2.36  48.5  1.00  53.6   1.00   0.5    397   0.50  46.5  32.3  1895  3.2  1.36  1.34
5          FlyQuest    S10     NA    14    57.1%  1.28  1770    65         34:55         13.4          10.4           6.5         5.2  71.4  35.7  2.86  53.4  1.00  50.0   0.79  -0.1     69   0.71  69.2  32.7  1801  3.2  1.16  1.72
6  Golden Guardians    S10     NA    14    50.0%  0.96  1740     6         36:13         10.7          11.1           6.3         6.1  50.0  35.7  3.29  62.8  0.86  42.9   1.43   0.1    711   0.50  43.6  33.7  1944  3.2  1.27  1.53
7         Immortals    S10     NA    14    21.4%  0.54  1609  -246         33:54          7.5          14.0           4.3         7.9  35.7  35.7  2.29  39.9  1.00  53.6   0.79  -0.4  -1509   0.36  25.0  31.4  1734  3.3  1.37  1.47
8       Team Liquid    S10     NA    14    78.6%  1.31  1796   135         35:07         11.4           8.6           7.9         4.4  42.9  64.3  2.36  43.6  0.93  50.0   1.14   0.2    522   1.21  78.6  33.1  1755  3.5  1.27  1.42
9               TSM    S10     NA    14    64.3%  1.12  1768    52         34:20         11.6          10.4           7.2         5.7  50.0  78.6  2.79  51.9  1.21  64.3   0.93   0.1   -129   0.86  57.1  32.6  1729  3.2  1.33  1.33

以下脚本提取数据并将数据保存到csv文件中

import requests
from bs4 import BeautifulSoup
import pandas as pd

res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/')

soup = BeautifulSoup(res.text, 'html.parser')

table = soup.find("table", class_="table_list playerslist tablesaw trhover")

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

data = []

table.find("thead").extract()

for tr in table.find_all("tr"):
    data.append([td.get_text(strip=True) for td in tr.find_all("td")])

df = pd.DataFrame(data, columns=columns)

df.to_csv("data.csv", index=False)
输出:

               Name Season Region Games Win rate   K:D   GPM   GDM Game duration Kills / game Deaths / game Towers killed Towers lost   FB%   FT% DRAPG  DRA% HERPG  HER% DRA@15 TD@15  GD@15 NASHPG NASH%   CSM   DPM  WPM  VWPM  WCPM
0       100 Thieves    S10     NA    14    35.7%  0.91  1744   -48         33:19         11.2          12.4           5.5         7.0  50.0  64.3  2.71  54.2  1.00  57.1   1.14   0.4   -378   0.64  42.9  33.2  1937  3.0  1.19  1.31
1               CLG    S10     NA    14    35.7%  0.81  1705  -120         35:25         10.6          13.2           4.9         7.9  28.6  28.6  1.93  31.5  0.57  28.6   0.64  -0.6  -1297   0.57  30.4  32.6  1826  3.2  1.17  1.37
2            Cloud9    S10     NA    14    78.6%  1.91  1922   302         28:52         15.0           7.9           8.3         3.1  64.3  64.3  3.07  72.5  1.43  71.4   1.29   0.7   2410   1.00  78.6  33.3  1921  3.0  1.10  1.26
3          Dignitas    S10     NA    14    28.6%  0.86  1663  -147         32:44          8.9          10.4           3.9         8.1  42.9  35.7  2.14  41.7  0.57  28.6   0.79  -0.7   -796   0.36  25.0  32.5  1517  3.1  1.28  1.23
4     Evil Geniuses    S10     NA    14    50.0%  0.85  1738    -0         34:09         11.1          13.1           6.5         6.0  64.3  57.1  2.36  48.5  1.00  53.6   1.00   0.5    397   0.50  46.5  32.3  1895  3.2  1.36  1.34
5          FlyQuest    S10     NA    14    57.1%  1.28  1770    65         34:55         13.4          10.4           6.5         5.2  71.4  35.7  2.86  53.4  1.00  50.0   0.79  -0.1     69   0.71  69.2  32.7  1801  3.2  1.16  1.72
6  Golden Guardians    S10     NA    14    50.0%  0.96  1740     6         36:13         10.7          11.1           6.3         6.1  50.0  35.7  3.29  62.8  0.86  42.9   1.43   0.1    711   0.50  43.6  33.7  1944  3.2  1.27  1.53
7         Immortals    S10     NA    14    21.4%  0.54  1609  -246         33:54          7.5          14.0           4.3         7.9  35.7  35.7  2.29  39.9  1.00  53.6   0.79  -0.4  -1509   0.36  25.0  31.4  1734  3.3  1.37  1.47
8       Team Liquid    S10     NA    14    78.6%  1.31  1796   135         35:07         11.4           8.6           7.9         4.4  42.9  64.3  2.36  43.6  0.93  50.0   1.14   0.2    522   1.21  78.6  33.1  1755  3.5  1.27  1.42
9               TSM    S10     NA    14    64.3%  1.12  1768    52         34:20         11.6          10.4           7.2         5.7  50.0  78.6  2.79  51.9  1.21  64.3   0.93   0.1   -129   0.86  57.1  32.6  1729  3.2  1.33  1.33

少了什么?第一行和第一列?缺少什么?第一行和第一列?非常感谢您的回复,我非常感谢。现在正在工作,但等我下班后就迫不及待地想试试!非常感谢您的回复,我非常感谢。现在正在工作,但等我下班后就迫不及待地想试试!令人惊叹的。谢谢你,今天下班后我会尝试一下,看看它的实际效果!令人惊叹的。谢谢你,今天下班后我会尝试一下,看看它的实际效果!