Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 正则表达式don';行不通_Python_Regex - Fatal编程技术网

Python 正则表达式don';行不通

Python 正则表达式don';行不通,python,regex,Python,Regex,嗨,我和regex有点问题 这是一些来源: <div class="resultHeader googleHeader"> Wyniki z Google </div> <div class="boxResult2 ">

嗨,我和regex有点问题

这是一些来源:

    <div class="resultHeader googleHeader">
                            Wyniki z Google
                    </div>

                <div class="boxResult2  ">
                                                                <div class="box ">
                <div class="result">
                    <div class="link"> <a href="http://www.google.com/glass/start/"><b>Google Glass</b></a> </div>
                    <div class="source">
                        http://www.google.com/glass/start/

                            - <a rel="nofollow" href="query.html?hl=pl&amp;qt=related:http%3A%2F%2Fwww.google.com%2Fglass%2Fstart%2F">Podobne strony</a>
                                            </div><!-- source END -->
                                            <div class="desc">Thanks for exploring with us. The journey doesn&#39;t end here. You&#39;ll start to see <br />
future versions of <b>Glass</b> when they&#39;re ready (for now, no peeking).</div>
                                    </div><!-- result End -->
            </div><!-- box End -->
                                                                <div class="box ">
                <div class="result">
                    <div class="link"> <a href="http://pl.wikipedia.org/wiki/Google_Glass"><b>Google Glass</b> – Wikipedia, wolna encyklopedia</a> </div>
                    <div class="source">
                        http://pl.wikipedia.org/wiki/Google_Glass

                            - <a rel="nofollow" href="query.html?hl=pl&amp;qt=related:http%3A%2F%2Fpl.wikipedia.org%2Fwiki%2FGoogle_Glass">Podobne strony</a>
                                            </div><!-- source END -->
                                            <div class="desc"><b>Google Glass</b> to okulary o rozszerzonej rzeczywistości stworzone przez firmę <br />
Google. Okulary te mają docelowo mieć funkcje standardowego smartfona, ale&nbsp;...</div>
                                    </div><!-- result End -->
            </div><!-- box End -->

这是我写的<代码>'既然您是用Python编写的,我可以建议一个基于Python的解决方案

from bs4 import BeautifulSoup
html = 'YOUR STRING'
soup = BeautifulSoup(html)
divs = soup.find_all("div", {"class":"link"})

for tag in divs:
    a = tag.find_all("a")
    for t in a:
        if t.has_attr('href'):
            print t['href']
根据您的示例输入,此输出:

http://www.google.com/glass/start/
http://pl.wikipedia.org/wiki/Google_Glass

这似乎对我有用。您能否详细说明使用正则表达式的上下文?为什么不使用X/HTML解析器?@Filburt“cannot”->“should not”您可以使用正则表达式解析有效HTML的子集,但使用解析器要比尝试确定要解析的HTML是否在该子集中好得多。
http://www.google.com/glass/start/
http://pl.wikipedia.org/wiki/Google_Glass