Python:读取隐藏HTML表的内容

Python:读取隐藏HTML表的内容,python,beautifulsoup,Python,Beautifulsoup,在这个网页上有一个“显示学习地点”选项卡,当我点击该选项卡时,它会显示整个地点列表,并更改我在这个程序中包含的网址。当我运行程序打印出整个位置列表时,我得到以下结果: soup = BeautifulSoup(urllib2.urlopen('https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&show_locs=Y#locn').read()) for row in sou

在这个网页上有一个“显示学习地点”选项卡,当我点击该选项卡时,它会显示整个地点列表,并更改我在这个程序中包含的网址。当我运行程序打印出整个位置列表时,我得到以下结果:

soup = BeautifulSoup(urllib2.urlopen('https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&show_locs=Y#locn').read())

for row in soup('table')[5].findAll('tr'):
    tds = row('td')
    if len(tds)<2:
        continue
    print tds[0].string, tds[1].string  #, '\n'.join(filter(unicode.strip, tds[1].strings))

Local Institution None
Local Institution None
Local Institution None
Local Institution None
Local Institution None

等等。我想打印出整个位置列表。

本地机构的行中只有一个表格单元格,但您正在跳过这些单元格

也许您需要从所有单元格中提取数据,并且只跳过不带
单元格的行:

for row in soup('table')[5].findAll('tr'):
    tds = row('td')
    if not tds:
        continue
    print u' '.join([cell.string for cell in tds if cell.string])
这就产生了

United States, California
Va Long Beach Healthcare System
Long Beach, California, United States, 90822  
United States, Georgia
Gastrointestinal Specialists Of Georgia Pc
Marietta, Georgia, United States, 30060  
# .... 
Local Institution
Taipei, Taiwan, 100  
Local Institution
Taoyuan, Taiwan, 333  
United Kingdom
Local Institution
London, Greater London, United Kingdom, SE5 9RS  

看起来内容可能是基于用户代理修改的,也可能是由JavaScript填充的<代码>wget--无检查证书https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&s how_locs=Y为我提供了一个没有您要查找的任何位置的文件。非常感谢。非常感谢。成功了!
United States, California
Va Long Beach Healthcare System
Long Beach, California, United States, 90822  
United States, Georgia
Gastrointestinal Specialists Of Georgia Pc
Marietta, Georgia, United States, 30060  
# .... 
Local Institution
Taipei, Taiwan, 100  
Local Institution
Taoyuan, Taiwan, 333  
United Kingdom
Local Institution
London, Greater London, United Kingdom, SE5 9RS