Php 如何获取htm元素的样式属性和行号？_Php_Css_Attributes_Web Crawler

Php 如何获取htm元素的样式属性和行号？

php css web-crawler

Php 如何获取htm元素的样式属性和行号？,php,css,attributes,web-crawler,Php,Css,Attributes,Web Crawler,我试过下面的方法，但它不能正常工作，它给我所有类型的标签，也给我没有样式属性的标签用curl加载网站将html正文从curl添加到名为：$bodyhtml的变量使用preg_match_all查找页面上的所有样式属性，但不能按预期工作。我的preg_match_all： preg_mathc_all = preg_match_all('/(<[^>]+) style=".*?"/i', $bodyhtml, $matches); 获取样式属性值的最佳方法是什么？如果可能的话

我试过下面的方法，但它不能正常工作，它给我所有类型的标签，也给我没有样式属性的标签

用curl加载网站将html正文从curl添加到名为：$bodyhtml的变量使用preg_match_all查找页面上的所有样式属性，但不能按预期工作。我的preg_match_all：

preg_mathc_all = preg_match_all('/(<[^>]+) style=".*?"/i', $bodyhtml, $matches);

获取样式属性值的最佳方法是什么？如果可能的话，获取文档中找到它的行的最佳方法是什么？

感谢您迄今为止的帮助。我最终使用了DOM，并且工作正常：谢谢

我一直试图找到一个函数/类，它可以在文档中找到DOM或HTML字符串

但到目前为止运气不好

有谁有好的方法可以做到这一点吗？

说这应该很容易做到，我想我会尝试一下，现在不得不承认这并不像我最初想象的那么简单。下面的内容很接近，OP的某些人可能希望花一些时间来研究它，看看哪里可以改进

注释/评论的方式很少，所以我可能会因此被否决，但我把它留在这里，希望op能让它更准确，离我不远了

$url='http://stackoverflow.com/questions/34998468/how-to-get-style-attributes-and-line-number-of-htm-elements#34998468';
$tmp=tempnam( sys_get_temp_dir(), 'html' );
file_put_contents( $tmp, file_get_contents( $url ) );

$dom=new DOMDocument;
$dom->loadHTMLFile( $tmp );

$xp=new DOMXPath( $dom );/* the xpath query could be improved */
$col=$xp->query( '//*[@style]', $dom->getElementsByTagName('body')->item(0) );

if( $col ){
    $data=array();
    /* iterate through nodes found by xpath query */
    foreach( $col as $node ){
        $tag=$node->tagName;
        $value=$node->nodeValue;
        $style=$node->getAttribute('style');
        /* create array for later use */
        $data[]=(object)array( 'tag'=>$tag, 'style'=>$style, 'html'=>trim( strip_tags( $value ) ) );
    }

    /* connect to the new file */
    $spl=new SplFileObject( $tmp );

    /* iterate through array found from xpath */
    foreach( $data as $key => $obj ){
        $str=$obj->html;
        $i=1;

        if( !empty( $str ) && strlen( $str ) > 1 ){/* ignore empty strings */
            $spl->fseek( 0 );

            while( !$spl->eof() ) {/* read the html source file line by line + make matches */
                if( stristr( $spl->fgetss(), $str ) ) {
                    echo 'line: '.$i.', tag: '.$obj->tag.', html:'.$str.', style:'.$obj->style.BR;
                    break;  
                }
                $i++;
            }
        }
    }
}
@unlink( $tmp );
$dom = $xp = $col = $spl = $tmp = null;

易于理解的不要使用正则表达式。存在是有原因的，就是这样……与其使用正则表达式，为什么不使用DOMDocument，也可以使用DOMDxpath来实现这一点呢？也许更简单的是，Hanks会这样看。您知道如何通过DOM获取文档中元素的行吗？有可能吗？当然，没有一个内置的函数可以直接实现这一点，但它应该很容易实现