Php 在html中找到一个元素并将其分解为库存_Php_Html_Find_Explode

Php 在html中找到一个元素并将其分解为库存

php html

Php 在html中找到一个元素并将其分解为库存,php,html,find,explode,Php,Html,Find,Explode,我想检索页面中的HTML元素 <h2 id="resultCount" class="resultCount"> <span> Showing 1 - 12 of 40,923 Results </span> </h2> 目前，结果的总数没有得到很好的恢复，即使在应该的时候，这种情况也没有发生有人会找到这个问题的解决方案或其他方法吗？正则表达式会这样做： ... preg_match("/of ([0-9

我想检索页面中的HTML元素

<h2 id="resultCount" class="resultCount">

    <span>

        Showing 1 - 12 of 40,923 Results

    </span>

</h2>

目前，结果的总数没有得到很好的恢复，即使在应该的时候，这种情况也没有发生

有人会找到这个问题的解决方案或其他方法吗？

正则表达式会这样做：

...
preg_match("/of ([0-9,]+) Results/", $htmlResultCount[0], $matches);
$europeFormatCount = intval(str_replace(",", "", $matches[1]));
...

请尝试此代码

define("MAX_RESULT_ALL_PAGES", 1200);  

// new dom object
$dom = new DOMDocument();

// HTML string
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$html_string = file_get_contents($queryUrl);

//load the html
$html = $dom->loadHTML($html_string);

//discard white space 
$dom->preserveWhiteSpace = TRUE;

//Get all h2 tags
$nodes = $dom->getElementsByTagName('h2');

// Store total result count
$totalCount = 0;

// loop over the all h2 tags and print result
foreach ($nodes as $node) {
    if ($node->hasAttributes()) {
        foreach ($node->attributes as $attribute) {
            if ($attribute->name === 'class' && $attribute->value == 'resultCount') {
                $inner_html = str_replace(',', '', trim($node->nodeValue));
                $inner_html_array = explode(' ', $inner_html);

                // Print result to the terminal 
                $totalCount += $inner_html_array[5];
            }
        }
    }
}

// If result count grater than 1200, do this
if ($totalCount > MAX_RESULT_ALL_PAGES) {
      $queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}

我只需将页面作为字符串（而不是html）获取，并使用正则表达式获取结果总数。代码如下所示：

define('MAX_RESULT_ALL_PAGES', 1200);

$queryUrl    = AMAZON_TOTAL_BOOKS_COUNT . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
$queryResult = file_get_contents($queryUrl);

if (preg_match('/of\s+([0-9,]+)\s+Results/', $queryResult, $matches)) {
    $totalResults = (int) str_replace(',', '', $matches[1]);
} else {
    throw new \RuntimeException('Total number of results not found');
}

if ($totalResults > MAX_RESULT_ALL_PAGES) {
    $queryUrl = AMAZON_SEARCH_URL . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
    // ...
}

尝试一下：

$match =array();
preg_match('/(?<=of\s)(?:\d{1,3}+(?:,\d{3})*)(?=\sResults)/', $htmlResultCount, $match);
$europeFormatCount = str_replace(',','',$match[0]);

$match=array（）；
preg_match（“/”？结果来自何处？从页面上的文本中计算结果似乎是一种奇怪的方法……结果取自亚马逊页面上的研究结果。我被迫检索她，因为这是知道总共有多少个结果的唯一方法。尝试正则表达式$results=preg_match_all（“）/([\d，\.]+）\s*？结果/'，$resultCountArray[5]）此代码无法运行，我无法执行此操作，因为html页面是由Amazon生成的，我无法更改AmazonI生成的html DOM。我已经修改了答案。您可以尝试一下吗。顺便问一下，您希望从Amazon获得的DOM生成更改是什么。因为据我所知，此代码不需要任何类似的更改。请您解释一下吗？我将检索发送到Amazon的查询的响应和答案我需要知道研究结果的总数。因为如果研究结果少于1200个，我可以突然检索，但如果超过1200个，我应该进行不同的处理，以获得所有结果。你明白吗？请求se您是否可以尝试执行此代码，如果发现任何问题，请告诉我。
$match =array();
preg_match('/(?<=of\s)(?:\d{1,3}+(?:,\d{3})*)(?=\sResults)/', $htmlResultCount, $match);
$europeFormatCount = str_replace(',','',$match[0]);