Php 计算多个文件中的词频
将您得到的内容放入函数中,并使用循环为数组中的每个文件名调用该函数:Php 计算多个文件中的词频,php,word,word-frequency,Php,Word,Word Frequency,将您得到的内容放入函数中,并使用循环为数组中的每个文件名调用该函数: 您是否尝试过为每个文件将整个内容包装成一个循环? <?php $filename = "largefile.txt"; /* get content of $filename in $content */ $content = strtolower(file_get_contents($filename)); /* split $content into array of substrings of
您是否尝试过为每个文件将整个内容包装成一个循环?
<?php
$filename = "largefile.txt";
/* get content of $filename in $content */
$content = strtolower(file_get_contents($filename));
/* split $content into array of substrings of $content i.e wordwise */
$wordArray = preg_split('/[^a-z]/', $content, -1, PREG_SPLIT_NO_EMPTY);
/* "stop words", filter them */
$filteredArray = array_filter($wordArray, function($x){
return !preg_match("/^(.|a|an|and|the|this|at|in|or|of|is|for|to)$/",$x);
});
/* get associative array of values from $filteredArray as keys and their frequency count as value */
$wordFrequencyArray = array_count_values($filteredArray);
/* Sort array from higher to lower, keeping keys */
arsort($wordFrequencyArray);
<?php
$wordFrequencyArray = array();
function countWords($file) use($wordFrequencyArray) {
/* get content of $filename in $content */
$content = strtolower(file_get_contents($filename));
/* split $content into array of substrings of $content i.e wordwise */
$wordArray = preg_split('/[^a-z]/', $content, -1, PREG_SPLIT_NO_EMPTY);
/* "stop words", filter them */
$filteredArray = array_filter($wordArray, function($x){
return !preg_match("/^(.|a|an|and|the|this|at|in|or|of|is|for|to)$/",$x);
});
/* get associative array of values from $filteredArray as keys and their frequency count as value */
foreach (array_count_values($filteredArray) as $word => $count) {
if (!isset($wordFrequencyArray[$word])) $wordFrequencyArray[$word] = 0;
$wordFrequencyArray[$word] += $count;
}
}
$filenames = array('file1.txt', 'file2.txt', 'file3.txt', 'file4.txt' ...);
foreach ($filenames as $file) {
countWords($file);
}
print_r($wordFrequencyArray);