Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从PHP中的HTML代码中提取HTML数据_Php_Web Scraping - Fatal编程技术网

从PHP中的HTML代码中提取HTML数据

从PHP中的HTML代码中提取HTML数据,php,web-scraping,Php,Web Scraping,我想从页面中提取一些数据 我需要的数据是以下几行中HTML标记之间的文本: <div class="tgme_page_title">تست</div> <div class="tgme_page_extra">4 members</div> <a class="tgme_action_button_new" href="tg://join?invite=GYJezj_NevMyTZP5KchgPA"> Join Gro

我想从页面中提取一些数据

我需要的数据是以下几行中HTML标记之间的文本:

<div class="tgme_page_title">تست</div>    
<div class="tgme_page_extra">4 members</div>
<a class="tgme_action_button_new" href="tg://join?invite=GYJezj_NevMyTZP5KchgPA">
    Join Group
</a>   
我该怎么做

感谢您抽出时间使用HTML DOM解析器

查找带有类的div并提取其值的代码示例,我没有尝试:

$html = '<div class="tgme_page_title">تست</div><div class="tgme_page_extra">4 members</div><a class="tgme_action_button_new" href="tg://join?invite=GYJezj_NevMyTZP5KchgPA">Join Group</a>';

$dom = new DOMDocument();
$dom->loadHTML($html);
$finder = new DomXPath($dom);

$classname = "tgme_page_title";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$data1 = $nodes{0}->nodeValue;

$classname = "tgme_page_extra";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$data2 = $nodes{0}->nodeValue;

$classname = "tgme_action_button_new";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$data3 = $nodes{0}->nodeValue;
$html='4个成员';
$dom=新的DOMDocument();
$dom->loadHTML($html);
$finder=newdomxpath($dom);
$classname=“tgme\u页面\u标题”;
$nodes=$finder->query(“/*[contains(concat(“”,规范化空间(@class),“”),“$classname”)”)”;
$data1=$nodes{0}->nodeValue;
$classname=“tgme\u页面\u额外”;
$nodes=$finder->query(“/*[contains(concat(“”,规范化空间(@class),“”),“$classname”)”)”;
$data2=$nodes{0}->nodeValue;
$classname=“tgme\u操作按钮\u新建”;
$nodes=$finder->query(“/*[contains(concat(“”,规范化空间(@class),“”),“$classname”)”)”;
$data3=$nodes{0}->nodeValue;

以下是文档:

$html='4个成员';echo strip_标签($html);您需要使用
curl()
函数从URL检索数据。请说明我是如何做到这一点的。我不熟悉phpthats返回nullWell请参阅文档,在某些情况下,它不是“textContent”,但有一个很好的函数来提取标记之间的文本。那么->nodeValue呢?再次没有返回Null的内容。请尝试使用此函数。这对我有用。
$html = '<div class="tgme_page_title">تست</div><div class="tgme_page_extra">4 members</div><a class="tgme_action_button_new" href="tg://join?invite=GYJezj_NevMyTZP5KchgPA">Join Group</a>';

$dom = new DOMDocument();
$dom->loadHTML($html);
$finder = new DomXPath($dom);

$classname = "tgme_page_title";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$data1 = $nodes{0}->nodeValue;

$classname = "tgme_page_extra";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$data2 = $nodes{0}->nodeValue;

$classname = "tgme_action_button_new";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$data3 = $nodes{0}->nodeValue;