php提取IMDb搜索
我需要帮助,以提取照片的网址与下面的字符串php提取IMDb搜索,php,regex,Php,Regex,我需要帮助,以提取照片的网址与下面的字符串 <tr class="findResult odd"> <td class="primary_photo"><a href="/title/tt0499549/?ref_=fn_tt_tt_1" ><img src="http://ia.media-imdb.com/images/M/MV5BMTYwOTEwNjAzMl5BMl5BanBnXkFtZTcwODc5MTUwMw@@._
<tr class="findResult odd">
<td class="primary_photo"><a href="/title/tt0499549/?ref_=fn_tt_tt_1" ><img src="http://ia.media-imdb.com/images/M/MV5BMTYwOTEwNjAzMl5BMl5BanBnXkFtZTcwODc5MTUwMw@@._V1_SX32_CR0,0,32,44_AL_.jpg" /></a></td>
<td class="result_text"><a href="/title/tt0499549/?ref_=fn_tt_tt_1" >Avatar</a> (2009) </td>
</tr>
<tr class="findResult even">
<td class="primary_photo"><a href="/title/tt0417299/?ref_=fn_tt_tt_2" ><img src="http://ia.media-imdb.com/images/M/MV5BMTM3MTc3OTc0NF5BMl5BanBnXkFtZTcwOTQ0OTM1MQ@@._V1._CR34,0,295,440_SX32_CR0,0,32,44_AL_.jpg" /></a></td>
<td class="result_text"><a href="/title/tt0417299/?ref_=fn_tt_tt_2" >Avatar: The Last Airbender</a> (2005) (TV Series) </td>
</tr>
(2009)
(2005)(电视连续剧)
我可以使用PHP HTML DOM解析器,但我正在学习正则表达式。这是我的密码
preg_match_all('!class="result_text"\s*>\s*<a href="/title/tt(?<imdbid>\d{7})/[^>]*>(?<title>.*?)</a>\s*(\([^\d{4}]\)\s*)?(\((?<year>\d{4})(.*?|)\)|)(?<type>[^<]*)!ims', $str, $matches);
preg\u match\u all('!class=“result\u text”\s*>\s*\s*(\([^\d{4})\s*))(\(((?\d{4})(*?))\)(?[^考虑尝试:
preg_match_all('!<img src="(?<imageURL>[^"]*)"\s*>[.\s]*?class="result_text"\s*>\s*<a href="/title/tt(?<imdbid>\d{7})/[^>]*>(?<title>.*?)</a>\s*(\([^\d{4}]\)\s*)?(\((?<year>\d{4})(.*?|)\)|)(?<type>[^<]*)!ims', $str, $matches);
preg\u match\u all('![^“]*)“\s*>[.\s]*?class=“result\u text”\s*\s*(\([^\d{4}]\)\s*))(\((?\d{4})(*?)))(?[^一般来说,使用正则表达式只提取一些已知格式的URL以及您信任的文件中的URL不是一个坏主意
因此,我不喜欢仅用1个正则表达式解析2个相邻标记的想法,但因为您正在学习:
<img\s[^>]*src="(?<imageURL>[^"]*)"\s*\/> # IMG tag
.*? # Anything in-between IMG and A
<a\s[^>]*?href="\/title\/tt
(?<imdbid>\d{7}) # Got the imdbid
\/[^>]*>(?<title>.*?) # Got title
<\/a> # End of A tag
\s*\(
(?<year>\d{4}) # Year
\)\s*(?:\( # Type is optional
(?<type>[^<]*) # Type
\))? # End of optional group
使用HTML解析器我可以使用HTML DOM解析器类,但我正在学习正则表达式:(如果你想学习正则表达式,那么就使用HTML或xml以外的文件。非常感谢你的详细解释。这也是一个很好的练习网站。
$re = "/<img\\s[^>]*src=\"(?<imageURL>[^\"]*)\"\\s*\\/> # IMG tag
.*? # Anything in-between IMG and A
<a\\s[^>]*?href=\"\\/title\\/tt
(?<imdbid>\\d{7}) # Got the imdbid
\\/[^>]*>(?<title>.*?) # Got title
<\\/a> # End of A tag
\\s*\\(
(?<year>\\d{4}) # Year
\\)\\s*(?:\\( # Type is optional
(?<type>[^<]*) # Type
\\))? # End of optional group/isx";
$str = "<tr class=\"findResult odd\">\n <td class=\"primary_photo\"><a href=\"/title/tt0499549/?ref_=fn_tt_tt_1\" ><img src=\"http://ia.media-imdb.com/images/M/MV5BMTYwOTEwNjAzMl5BMl5BanBnXkFtZTcwODc5MTUwMw@@._V1_SX32_CR0,0,32,44_AL_.jpg\" /></a></td>\n <td class=\"result_text\"><a href=\"/title/tt0499549/?ref_=fn_tt_tt_1\" >Avatar</a> (2009) </td>\n </tr>\n <tr class=\"findResult even\">\n <td class=\"primary_photo\"><a href=\"/title/tt0417299/?ref_=fn_tt_tt_2\" ><img src=\"http://ia.media-imdb.com/images/M/MV5BMTM3MTc3OTc0NF5BMl5BanBnXkFtZTcwOTQ0OTM1MQ@@._V1._CR34,0,295,440_SX32_CR0,0,32,44_AL_.jpg\" /></a></td>\n <td class=\"result_text\"><a href=\"/title/tt0417299/?ref_=fn_tt_tt_2\" >Avatar: The Last Airbender</a> (2005) (TV Series) </td>\n </tr>";
preg_match_all($re, $str, $matches);