Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php Python正则表达式忽略新行_Php_Python_Html_Regex_Beautifulsoup - Fatal编程技术网

Php Python正则表达式忽略新行

Php Python正则表达式忽略新行,php,python,html,regex,beautifulsoup,Php,Python,Html,Regex,Beautifulsoup,我的网页看起来像这样 <td valign="top"> <table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3"> <tr> <td colspan="2"> <div align="center"> <a href

我的网页看起来像这样

<td valign="top">

    <table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
        <tr>
            <td colspan="2">
                <div align="center">
                <a href="/title/name.php" target="_blank">
                <img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
                </a>
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
        </tr>
        <tr>
            <td><span class="style10">Cat1 :</span></td>
            <td>1st name</td>
        </tr>
        <tr>
            <td width="32%"><span class="style10">Cat2 :</span></td>
            <td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
        </tr>
        <tr>
            <td><span class="style10">cat4 :</span></td>
            <td>Bla bla</td>
        </tr>
        <tr>
            <td><span class="style10">Cat3 :</span></td>
            <td>thirdName2</td>
        </tr>
    </table>

</td>
<td valign="top">

    <table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
        <tr>
            <td colspan="2">
                <div align="center">
                <a href="/title/name.php" target="_blank">
                <img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
                </a>
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
        </tr>
        <tr>
            <td><span class="style10">Cat1 :</span></td>
            <td>1st name</td>
        </tr>
        <tr>
            <td width="32%"><span class="style10">Cat2 :</span></td>
            <td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
        </tr>
        <tr>
            <td><span class="style10">cat4 :</span></td>
            <td>Bla bla</td>
        </tr>
        <tr>
            <td><span class="style10">Cat3 :</span></td>
            <td>thirdName2</td>
        </tr>
    </table>

</td>

第1类:
名字
第二类:
第四类:
布拉布拉
第三类:
第三名2
第1类:
名字
第二类:
第四类:
布拉布拉
第三类:
第三名2
我想使用python正则表达式从这个站点获得某些值。 在
之后,我想从

我试过这个:
regex='class=“main\u tb3”*\n\n


请帮助我,你可以使用下面的正则表达式


对于href值:
您可以使用下面的正则表达式


对于href value:
,您会发现安装类似以下操作的软件要简单得多:

from bs4 import BeautifulSoup

html = """
<td valign="top">

    <table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
        <tr>
            <td colspan="2">
                <div align="center">
                <a href="/title/name.php" target="_blank">
                <img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
                </a>
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
        </tr>
        <tr>
            <td><span class="style10">Cat1 :</span></td>
            <td>1st name</td>
        </tr>
        <tr>
            <td width="32%"><span class="style10">Cat2 :</span></td>
            <td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
        </tr>
        <tr>
            <td><span class="style10">cat4 :</span></td>
            <td>Bla bla</td>
        </tr>
        <tr>
            <td><span class="style10">Cat3 :</span></td>
            <td>thirdName2</td>
        </tr>
    </table>

</td>
<td valign="top">

    <table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
        <tr>
            <td colspan="2">
                <div align="center">
                <a href="/title/name.php" target="_blank">
                <img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
                </a>
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
        </tr>
        <tr>
            <td><span class="style10">Cat1 :</span></td>
            <td>1st name</td>
        </tr>
        <tr>
            <td width="32%"><span class="style10">Cat2 :</span></td>
            <td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
        </tr>
        <tr>
            <td><span class="style10">cat4 :</span></td>
            <td>Bla bla</td>
        </tr>
        <tr>
            <td><span class="style10">Cat3 :</span></td>
            <td>thirdName2</td>
        </tr>
    </table>

</td>"""

soup = BeautifulSoup(html)

for table in soup.find_all("table", class_="main_tb3"):
    print table.find('a').get('href')
    print table.find('h1').text

您会发现,安装类似以下操作的软件要简单得多:

from bs4 import BeautifulSoup

html = """
<td valign="top">

    <table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
        <tr>
            <td colspan="2">
                <div align="center">
                <a href="/title/name.php" target="_blank">
                <img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
                </a>
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
        </tr>
        <tr>
            <td><span class="style10">Cat1 :</span></td>
            <td>1st name</td>
        </tr>
        <tr>
            <td width="32%"><span class="style10">Cat2 :</span></td>
            <td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
        </tr>
        <tr>
            <td><span class="style10">cat4 :</span></td>
            <td>Bla bla</td>
        </tr>
        <tr>
            <td><span class="style10">Cat3 :</span></td>
            <td>thirdName2</td>
        </tr>
    </table>

</td>
<td valign="top">

    <table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
        <tr>
            <td colspan="2">
                <div align="center">
                <a href="/title/name.php" target="_blank">
                <img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
                </a>
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
        </tr>
        <tr>
            <td><span class="style10">Cat1 :</span></td>
            <td>1st name</td>
        </tr>
        <tr>
            <td width="32%"><span class="style10">Cat2 :</span></td>
            <td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
        </tr>
        <tr>
            <td><span class="style10">cat4 :</span></td>
            <td>Bla bla</td>
        </tr>
        <tr>
            <td><span class="style10">Cat3 :</span></td>
            <td>thirdName2</td>
        </tr>
    </table>

</td>"""

soup = BeautifulSoup(html)

for table in soup.find_all("table", class_="main_tb3"):
    print table.find('a').get('href')
    print table.find('h1').text
我将把这个放在这里:=>您不能用正则表达式解析[X]HTML。因为正则表达式无法解析HTML。Regex不是一个可以用来正确解析HTML的工具。请使用它来解析HTML。我将把它放在这里:=>你不能用Regex解析[X]HTML。因为正则表达式无法解析HTML。Regex不是一个可以用来正确解析HTML的工具。请使用它来解析HTML。