Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python,试图解析html以获得电子邮件地址_Python_Html_Regex_Html Parsing_Beautifulsoup - Fatal编程技术网

Python,试图解析html以获得电子邮件地址

Python,试图解析html以获得电子邮件地址,python,html,regex,html-parsing,beautifulsoup,Python,Html,Regex,Html Parsing,Beautifulsoup,我正在使用beautifulsoup获得电子邮件地址,但我遇到了问题。 我不知道从哪里开始解析通过这个,以获得电子邮件地址 > #input: url > #output: address > > def urlSC(url): > soup = BeautifulSoup(urllib2.urlopen(url).read()) > #word = soup.prettify() > word = soup.

我正在使用beautifulsoup获得电子邮件地址,但我遇到了问题。 我不知道从哪里开始解析通过这个,以获得电子邮件地址

> #input:     url
> #output:    address
> 
> def urlSC(url):
>     soup = BeautifulSoup(urllib2.urlopen(url).read())
>     #word =  soup.prettify() 
>     word = soup.find_all('a')
>     print word
>     return word
输出:

>     [<a href="default.aspx"><img alt="·Î°í" border="0" src="image/logo.gif"/></a>, <a href="http://www.ctodayusa.com"><img
> border="0" src="image/ctodayusa.jpg"><a></a>
>     </img></a>, <a></a>, <a href="mailto:rev_han777@yahoo.com" id="hlEmail">rev_han777@yahoo.com</a>, <a id="hlHomepage"></a>, <a
> href="javascript:img_up('','','');"><img border="0" class="img"
> src="upload/" vspace="10" width="1"/></a>, <a
> href="javascript:img_up('','','');"><img border="0" class="img"
> src="upload/" vspace="10" width="1"/></a>, <a
> href="javascript:openWin('http://maps.yahoo.com/maps_result?addr=2100
> De armoun Rd.&amp;csz=99515&amp;country=us')" id="hlMap"><img
> border='0"' src="images/globe.gif"> 위치</img></a>, <a
> href="javascript:print()"><img border="0" src="images/printer.gif">
> 프린트</img></a>, <a href="javascript:mail_go('rev_han777@yahoo.com',
> '2Y5E9%2bk0h%2b4P%2f0H3jEJTq9VUG%2f0gaj40')" id="hlSendMail"><img
> border="0" src="images/mails.gif"> 메일보내기</img></a>, <a
> href="javascript:history.go(-1)"><img border="0"
> src="images/list.gif">
>     </img></a>, <a href="UpdateAddress.aspx?OrgID=4102" id="hlModify"><img alt="" border="0" src="Images/Modify.gif"/></a>]
[,,
>     , , , , ]

我想要这个电子邮件:rev_han777@yahoo.com

按id获取
a
元素,从
href
属性值中提取
mailto:
之后的所有内容:

link = soup.find('a', id='hlEmail')
print link['href'][7:]
演示:

这个url:什么,没有正则表达式?……)+1.
>>> from bs4 import BeautifulSoup
>>> import urllib2
>>> url = "http://www.koreanchurchyp.com/ViewDetail.aspx?OrgID=4102"
>>> soup = BeautifulSoup(urllib2.urlopen(url))
>>> link = soup.find('a', id='hlEmail')
>>> print link['href'][7:]
rev_han seven seven seven at yahoo.com  #  obfuscated intentionally