Python 使用beautifulsoup查找特定标记_Python_Tags_Beautifulsoup

Python 使用beautifulsoup查找特定标记

python tags

Python 使用beautifulsoup查找特定标记,python,tags,beautifulsoup,Python,Tags,Beautifulsoup,我正在尝试使用以下站点获取给定邮政编码的城镇和州： http://www.zip-info.com/cgi-local/zipsrch.exe?zip=10023&Go=Go 使用以下代码，我获得所有tr标记： import sys import os from bs4 import BeautifulSoup import requests r = requests.get("http://www.zip-info.com/cgi-local/zipsrch.exe?zip=100

我正在尝试使用以下站点获取给定邮政编码的城镇和州：

http://www.zip-info.com/cgi-local/zipsrch.exe?zip=10023&Go=Go

使用以下代码，我获得所有

tr

标记：

import sys
import os
from bs4 import BeautifulSoup
import requests

r = requests.get("http://www.zip-info.com/cgi-local/zipsrch.exe?zip=10023&Go=Go")
data = r.text
soup = BeautifulSoup(data)
print soup.find_all('tr')

如何找到特定的

tr

标记？在Exmaple中，您已经知道要查找的文本。如果我事先不知道课文，我该怎么做

编辑

我现在添加了以下内容，但却一无所获：

for tag in soup.find_all(re.compile("^td align=")):
    print (tag.name)

在

html

源代码中，我将使用

find（）

和

find\u all（）

混合调用导航到该点，因为我无法根据位置、属性或其他内容区分其他

元素：

import sys 
import os
from bs4 import BeautifulSoup
import requests

l = list()


r = requests.get("http://www.zip-info.com/cgi-local/zipsrch.exe?zip=10023&Go=Go")
data = r.text
soup = BeautifulSoup(data)

for table in soup.find('table'):
    center = table.find_all('center')[3]
    for tr in center.find_all('tr')[-1]:
        l.append(tr.string)

print(l[0:-1])

像这样运行：

python script.py

这将产生：

[u'New York', u'NY']

在我看了你提供的网站的HTML代码之后，我会说最好的定位方法是“基于文本的定位”，而不是基于类、基于id等等

首先，您可以使用关键字“Mail”根据文本轻松识别

标题

行，然后您可以轻松获得包含所需内容的行

这是我的密码：

import urllib2, re, bs4
soup = bs4.BeautifulSoup(urllib2.urlopen("http://www.zip-info.com/cgi-local/zipsrch.exe?zip=10023&Go=Go"))
# find the header, then find the next tr, which contains your data
tr = soup.find(text=re.compile("Mailing")).find_next("tr")
name, code, zip = [ td.text.strip() for td in tr.find_all("td")]
print name
print code
print zip

打印出来后，它们看起来像这样：

New York
NY
10023

嗯，你想找到哪个标签？我没有看到任何名字，所以我现在不知道如何识别一个特定的标签。谢谢！但是，我不能让它工作。我正试图把你的代码分解成片段来理解它。什么是

elem[1]

？@dwstein:

enumerate（）

返回一个元组，第一个位置是索引，第二个位置是值。我这样做是为了避免打印表中的第三个值。如果有任何方法可以把那条长的线再画一点，那将非常有用。我不明白所有括号内的数字是什么。@dwstein：我改变了方法，对循环使用不同的

，并将所有数据保存在列表中。