Python 2.7 如何从urllib2从python中的url获取特定的标记数据_Python 2.7_Urllib2

Python 2.7 如何从urllib2从python中的url获取特定的标记数据

python-2.7

Python 2.7 如何从urllib2从python中的url获取特定的标记数据,python-2.7,urllib2,Python 2.7,Urllib2,我对Python2.7非常陌生，我的任务是读取URL中的表我从URL和表中获取数据。现在的问题是，我只需要数据，但我也需要标签。请帮帮我。先谢谢你 from bs4 import BeautifulSoup import urllib2 response = urllib2.urlopen('https://www.somewebsite.com/') html = response.read() soup = BeautifulSoup(html) t

我对Python2.7非常陌生，我的任务是读取URL中的表

我从URL和表中获取数据。现在的问题是，我只需要数据，但我也需要标签。请帮帮我。先谢谢你

from bs4 import BeautifulSoup
import urllib2


    response = urllib2.urlopen('https://www.somewebsite.com/')
    html = response.read()
    soup = BeautifulSoup(html)

    tabulka = soup.find("table", {"class" : "defaultTableStyle tableFontMD tableNoBorder"})



    records = [] 
    for row in tabulka.findAll('tr'):
        col = row.findAll('td')

        print col

您必须使用

.text

属性

from bs4 import BeautifulSoup
import urllib2


response = urllib2.urlopen('https://www.somewebsite.com/')
html = response.read()
soup = BeautifulSoup(html)

tabulka = soup.find("table", {"class" : "defaultTableStyle tableFontMD tableNoBorder"})



records = [] 
for row in tabulka.findAll('tr'):
    col = row.findAll('td')

    print [coli.text for coli in col]

谢谢你的回答，有一个问题，即我得到的每个元素都有[u'Type'，u'Name'，u'Discovered'，]，但是htmlu中没有u意味着编码是unicode。它不是文本的一部分，您可以更改为

coli.text.encode（'utf-8'）

以摆脱它。非常感谢，并引导我最后一个问题，我在该表行中有一个标记，即带有一些链接，我如何读取该链接。使用

response=urlib2.urlopen（'https://www.somewebsite.com/quesitons')

实际上html格式是这样的。文本，我可以得到所有td的文本，但我也需要链接。有时我怎么能得到那个。