Php 计算字符串中的字数

Php 计算字符串中的字数,php,curl,Php,Curl,我正在打印一个字符串,其中包含来自给定url的HTML内容。我要做的是找出字符串中有多少单词,以及它们出现的次数 例如: 今天| 1 如何| 1 你好| 1 代码: 大概是这样的: $s = "lorem ipsum dolor sit amet, consectetur adipiscing elit, sit sed do lorem eiusmod tempor"; $w = preg_split('=[^\w]=', $s, NULL, PREG_SPLIT_NO_EMPTY);

我正在打印一个字符串,其中包含来自给定url的HTML内容。我要做的是找出字符串中有多少单词,以及它们出现的次数

例如:

今天| 1

如何| 1

你好| 1

代码:

大概是这样的:

  $s = "lorem ipsum dolor sit amet, consectetur adipiscing elit, sit sed do lorem eiusmod tempor";
  $w = preg_split('=[^\w]=', $s, NULL, PREG_SPLIT_NO_EMPTY);
  $words = [];

  foreach ($w as $word) {
    if (!isset($words[$word])) $words[$word] = 0;
    $words[$word]++;
  }
  print_r($words);
输出:

Array
(
    [lorem] => 2
    [ipsum] => 1
    [dolor] => 1
    [sit] => 2
    [amet] => 1
    [consectetur] => 1
    [adipiscing] => 1
    [elit] => 1
    [sed] => 1
    [do] => 1
    [eiusmod] => 1
    [tempor] => 1
)

这就是您要找的吗?

将$cResult作为输入:

$word_counts = [];

// remove scripts and styles completely, then strip tags
$cResult = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $cResult);
$cResult = preg_replace('#<style(.*?)>(.*?)</style>#is', '', $cResult);
$cResult = strip_tags($cResult);

// strip all characters that are not letters:
$word_array_raw = explode(' ',preg_replace('/[^A-Za-z ]/', ' ', $cResult)); 

// loop through array:
foreach ($word_array_raw as $word) {
    $word = trim($word);
    if($word) {
        isset($word_counts[$word]) ? $word_counts[$word]++ : $word_counts[$word] = 1;
    }
}

// Array with all stats sorted in descending order:
arsort($word_counts); 

// Output format you wanted:
foreach ($word_counts as $word=>$count) { 
    echo "$word | $count<br>";
}
$word_counts=[];
//完全删除脚本和样式,然后删除标记
$cResult=preg#u replace(“#(.*?)#is”“,”$cResult);
$cResult=preg#u replace(“#(.*?)#is”“,”$cResult);
$cResult=带标签($cResult);
//去除所有非字母字符:
$word_array_raw=explode('',preg_replace('/[^A-Za-z]/','','',$cResult));
//循环通过阵列:
foreach($word\u数组\u原始为$word){
$word=trim($word);
如果($word){
isset($word\u计数[$word])?$word\u计数[$word]+:$word\u计数[$word]=1;
}
}
//所有统计数据按降序排序的数组:
arsort(字数);
//您想要的输出格式:
foreach($word_计为$word=>$count){
回声“$word |$count
”; }

希望它有帮助

我不确定您的代码与您的问题有什么关系,但您可以尝试在空格上拆分代码,然后在数组上循环,并将单词用作另一个数组中的键,每次递增。如果要在任何单词边界上拆分,请使用并在
\b
上拆分。就在我脑子里。有没有办法从结果中删除html标记名?我已经试过了。这只是删除了括号。我仍然保留着这个词本身。例如,getElementById。这是在“visitbirmingham”url上运行的代码(带strip|u标记)的当前输出:More | 118 Read | 113 the | 92 Birmingham | 72。。。等等。没有标记名。getElementById不是标记名,而是JS函数。现在,我将代码从html中删除js和css
$word_counts = [];

// remove scripts and styles completely, then strip tags
$cResult = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $cResult);
$cResult = preg_replace('#<style(.*?)>(.*?)</style>#is', '', $cResult);
$cResult = strip_tags($cResult);

// strip all characters that are not letters:
$word_array_raw = explode(' ',preg_replace('/[^A-Za-z ]/', ' ', $cResult)); 

// loop through array:
foreach ($word_array_raw as $word) {
    $word = trim($word);
    if($word) {
        isset($word_counts[$word]) ? $word_counts[$word]++ : $word_counts[$word] = 1;
    }
}

// Array with all stats sorted in descending order:
arsort($word_counts); 

// Output format you wanted:
foreach ($word_counts as $word=>$count) { 
    echo "$word | $count<br>";
}