Python 无法从表中获取所有名称_Python_Python 3.x_Web Scraping

Python 无法从表中获取所有名称

python python-3.x web-scraping

Python 无法从表中获取所有名称,python,python-3.x,web-scraping,Python,Python 3.x,Web Scraping,我用python创建了一个脚本，用于从网页中获取表中的所有名称。该表中的名称在页面源中可用，因此它们是静态内容。然而，当我尝试使用下面的脚本时，我得到的脚本很少（直到2012年Topps Heritage Run），而列表中还有很多如何使用请求从公司集标题下的表中获取所有名称？到目前为止，我已经尝试过： import requests from bs4 import BeautifulSoup url = "https://www.psacard.com/psasetregistry/b

我用python创建了一个脚本，用于从网页中获取表中的所有名称。该表中的名称在页面源中可用，因此它们是

静态内容

。然而，当我尝试使用下面的脚本时，我得到的脚本很少（直到

2012年Topps Heritage Run

），而列表中还有很多

如何使用请求从
公司集
标题下的表中获取所有名称？

到目前为止，我已经尝试过：

import requests
from bs4 import BeautifulSoup

url = "https://www.psacard.com/psasetregistry/baseball/company-sets/16"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select(".dataTable tr td a[href*='/baseball/company-sets/']"):
    print(item.text)

你能试一下吗

print([inner_tag.find('a').text for inner_tag in soup.findAll('table')[0].findAll('td') if inner_tag.find('a')])

说明：

实际上页面中有两个表，您的代码从这两个表中提取值。这就是为什么你会得到2012年的最后一个值
上述代码仅从名为
```
Company set
```

您可以将请求与html相结合

import pandas as pd
import requests
url = 'https://www.psacard.com/psasetregistry/baseball/company-sets/16'
headers = {'User-Agent' : 'Mozilla/5.0'}
r= requests.get(url, headers= headers)
tables = pd.read_html(r.content)
df = tables[0]
df.drop(df.index[[0]], inplace = True)
print(df)