使用php正则表达式从html文档live url获取任何用户输入的html标记_Php_Html_Tags_Fetch

使用php正则表达式从html文档live url获取任何用户输入的html标记

php html tags

使用php正则表达式从html文档live url获取任何用户输入的html标记,php,html,tags,fetch,Php,Html,Tags,Fetch,我想取得任何元，标题，脚本，链接标签，是HTML页面上可用的，这是我写的程序（不正确，但会给专家的想法）谢谢在使用正则表达式解析HTML之前，您需要从中读取第一个答案尝试使用DOMDocument，如下所示： <? function get_tags($tags, $url) { // Create a new DOM Document to hold our webpage structure $xml = new DOMDocument(); //

我想取得任何元，标题，脚本，链接标签，是HTML页面上可用的，这是我写的程序（不正确，但会给专家的想法）

谢谢

在使用正则表达式解析HTML之前，您需要从中读取第一个答案

尝试使用DOMDocument，如下所示：

<?

function get_tags($tags, $url) {

    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();

    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);

    // Empty array to hold all links to return
    $tags_found = array();

    //Loop through each <$tags> tag in the dom and add it to the $tags_found array
    foreach($xml->getElementsByTagName($tags) as $tag) {
        $tags_found[] = array('tag' => $tags, 'text' => $tag->nodeValue);
    }

    //Return the links
    return $tags_found;
}

print_r(get_tags('title', 'http://stackoverflow.com'));

?>

由于这些标记不能嵌套，因此不需要解析
#<(meta|title|script|link)(?: .*?)?(?:/>|>(.*?)<(?:/\1)>)#is

| |>（.*）是

如果要在函数中使用此选项，则必须编写$tag|u name，而不是“meta | title | script | link”
这个答案实际上会将标记的名称作为第一个数组值而不是“array”，并且还会停止警告。不建议使用正则表达式获取dom元素。我刚刚测试了这段代码，它在我的机器上就像一个符咒，没有任何警告。无论如何，你会得到一个警告，而不是一个错误。Upvote&accept是受欢迎的：）这是另一个问题-创建另一个。@Tudor，不，这不是另一个问题，他在问题中询问如何获得metas。张贴答案now@Liam-在原始问题中，提到他想要一些标签的内容，现在他还需要一些特定属性的值-这可以从两个方面看-作为一个新问题或第一个问题的进一步要求-我个人喜欢让事情尽可能简单-因此在我看来，在这种情况下，两个问题是最好的当然，我已经相应地编辑了。很高兴它对你有用。
<?

function get_tags($tags, $url) {

    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();

    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);

    // Empty array to hold all links to return
    $tags_found = array();

    //Loop through each <$tags> tag in the dom and add it to the $tags_found array
    foreach($xml->getElementsByTagName($tags) as $tag) {
        $tags_found[] = array('tag' => $tags, 'text' => $tag->nodeValue);
    }

    //Return the links
    return $tags_found;
}

print_r(get_tags('title', 'http://stackoverflow.com'));

?>

#<(meta|title|script|link)(?: .*?)?(?:/>|>(.*?)<(?:/\1)>)#is

function get_tags($tag, $url) {
//allow for improperly formatted html
libxml_use_internal_errors(true);
// Instantiate DOMDocument Class to parse html DOM
$xml = new DOMDocument();

// Load the file into the DOMDocument object
$xml->loadHTMLFile($url);

// Empty array to hold all links to return
$tags = array();

//Loop through all tags of the given type and store details in the array
foreach($xml->getElementsByTagName($tag) as $tag_found) {
      if ($tag_found->tagName == "meta")
      {
        $tags[] = array("meta_name" => $tag_found->getAttribute("name"), "meta_value" => $tag_found->getAttribute("content"));
      }
      else {
    $tags[] = array('tag' => $tag_found->tagName, 'text' => $tag_found->nodeValue);
     }
}

//Return the links
return $tags;
}