Php 计算职位的相似性
我使用的是Php 计算职位的相似性,php,Php,我使用的是PHP7.3,我正在计算帖子的相似性 <?php $posts = [ 'post_count' => 3, 'posts' => [ [ 'ID' => 1, 'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfo
PHP7.3
,我正在计算帖子的相似性
<?php
$posts = [
'post_count' => 3,
'posts' => [
[
'ID' => 1,
'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfort elderly be me. Walls began of child civil am heard hoped my. Satisfied pretended mr on do determine by.",
],
[
'ID' => 2,
'post_content' => "Lorem ipsum dolor sit"
],
[
'ID' => 3,
'post_content' => "Months on ye at by esteem desire warmth former. Sure that that way gave any fond now. His boy middleton sir nor engrossed affection excellent."
],
[
'ID' => 4,
'post_content' => "Lorem ipsum dolor sit"
],
]
];
print_r($posts);
function getNonSimilarTexts($posts)
{
$similarityPercentageArr = array();
for ($i = 0; $i <= $posts['post_count']; $i++) {
// $posts->the_post();
$currentPost = $posts['posts'][$i];
if (!is_null($currentPost['ID'])) {
for ($y = 0; $y <= $posts['post_count']; $y++) {
$comparePost = $posts['posts'][$y];
if (!is_null($comparePost['ID'])) {
similar_text(strip_tags($currentPost['post_content']), strip_tags($comparePost['post_content']), $perc);
// similarity is 100 if self compare
if ($perc != 100) {
array_push($similarityPercentageArr, [$currentPost['ID'], $comparePost['ID'], $perc]);
}
}
}
}
}
return $similarityPercentageArr;
}
$p = getNonSimilarTexts($posts);
print_r($p);
关于你的问题,回过头来看,标题似乎与你的问题不太匹配
仅仅通过另一个条件是不够的
<?php
$posts = [
'post_count' => 3,
'posts' => [
[
'ID' => 1,
'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfort elderly be me. Walls began of child civil am heard hoped my. Satisfied pretended mr on do determine by.",
],
[
'ID' => 2,
'post_content' => "Lorem ipsum dolor sit"
],
[
'ID' => 3,
'post_content' => "Months on ye at by esteem desire warmth former. Sure that that way gave any fond now. His boy middleton sir nor engrossed affection excellent."
],
[
'ID' => 4,
'post_content' => "Lorem ipsum dolor sit"
],
]
];
print_r($posts);
function getNonSimilarTexts($posts)
{
$similarityPercentageArr = array();
for ($i = 0; $i <= $posts['post_count']; $i++) {
// $posts->the_post();
$currentPost = $posts['posts'][$i];
if (!is_null($currentPost['ID'])) {
for ($y = 0; $y <= $posts['post_count']; $y++) {
$comparePost = $posts['posts'][$y];
if (!is_null($comparePost['ID'])) {
similar_text(strip_tags($currentPost['post_content']), strip_tags($comparePost['post_content']), $perc);
// similarity is 100 if self compare and more than 20
if ($perc != 100 && $perc > 20) {
array_push($similarityPercentageArr, [$currentPost['ID'], $comparePost['ID'], $perc]);
}
}
}
}
}
return $similarityPercentageArr;
}
$p = getNonSimilarTexts($posts);
print_r($p);
您可以立即进行过滤,将条件if($perc!=100)
更改为if($perc>20)
,这样您就只保留了想要删除的类似帖子。然后,您甚至可以完全跳过存储相似性,因为您已经有了要删除的post id数组的列表
所以,当您有这样的代码时:
if ($perc > 20) {
$similarityPercentageArr[$currentPost['ID']][] = $comparePost['ID'];
}
$postsToRemove = [];
$postsToKeep = [];
foreach ($similarityPercentageArr as $postId => $similarPostIds) {
// this post has already appeared as similar somewhere, so its similar posts have already been added
if (in_array($postId, $postsToRemove)) {
continue;
}
$postsToKeep[] = $postId;
$postsToRemove = array_merge($postsToRemove, $similarPostIds);
}
然后可以删除所有不需要的帖子,如下所示:
if ($perc > 20) {
$similarityPercentageArr[$currentPost['ID']][] = $comparePost['ID'];
}
$postsToRemove = [];
$postsToKeep = [];
foreach ($similarityPercentageArr as $postId => $similarPostIds) {
// this post has already appeared as similar somewhere, so its similar posts have already been added
if (in_array($postId, $postsToRemove)) {
continue;
}
$postsToKeep[] = $postId;
$postsToRemove = array_merge($postsToRemove, $similarPostIds);
}
现在您在$poststokep
中有了原始的post id,在$postsToRemove
中有了它的相似性id
我还将对代码进行一点优化,这样,当您知道您正在将帖子与自身进行比较时,就根本不会调用类似的\u text
。因此,如果(!is_null($comparePost['ID']),
您将拥有如果(!is_null($comparePost['ID']),&&$comparePost['ID']!=$currentPost['ID']),
if ($perc > 20) {
$similarityPercentageArr[$currentPost['ID']][] = $comparePost['ID'];
}
$postsToRemove = [];
$postsToKeep = [];
foreach ($similarityPercentageArr as $postId => $similarPostIds) {
// this post has already appeared as similar somewhere, so its similar posts have already been added
if (in_array($postId, $postsToRemove)) {
continue;
}
$postsToKeep[] = $postId;
$postsToRemove = array_merge($postsToRemove, $similarPostIds);
}