Php 计算职位的相似性

Php 计算职位的相似性,php,Php,我使用的是PHP7.3,我正在计算帖子的相似性 <?php $posts = [ 'post_count' => 3, 'posts' => [ [ 'ID' => 1, 'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfo

我使用的是
PHP7.3
,我正在计算帖子的相似性

<?php

$posts = [
    'post_count' => 3,
    'posts' => [
        [
            'ID' => 1,
            'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfort elderly be me. Walls began of child civil am heard hoped my. Satisfied pretended mr on do determine by.",
        ],
        [
            'ID' => 2,
            'post_content' => "Lorem ipsum dolor sit"
        ],
        [
            'ID' => 3,
            'post_content' => "Months on ye at by esteem desire warmth former. Sure that that way gave any fond now. His boy middleton sir nor engrossed affection excellent."
        ],
        [
            'ID' => 4,
            'post_content' => "Lorem ipsum dolor sit"
        ],
    ]
];

print_r($posts);

function getNonSimilarTexts($posts)
{
    $similarityPercentageArr = array();

    for ($i = 0; $i <= $posts['post_count']; $i++) {
        // $posts->the_post();
        $currentPost = $posts['posts'][$i];
        if (!is_null($currentPost['ID'])) {
            for ($y = 0; $y <= $posts['post_count']; $y++) {
                $comparePost = $posts['posts'][$y];
                if (!is_null($comparePost['ID'])) {
                    similar_text(strip_tags($currentPost['post_content']), strip_tags($comparePost['post_content']), $perc);
                    // similarity is 100 if self compare
                    if ($perc != 100) {
                        array_push($similarityPercentageArr, [$currentPost['ID'], $comparePost['ID'], $perc]);
                    }
                }
            }
        }
    }
    return $similarityPercentageArr;
}

$p = getNonSimilarTexts($posts);
print_r($p);

关于你的问题,回过头来看,标题似乎与你的问题不太匹配

仅仅通过另一个条件是不够的

<?php

$posts = [
    'post_count' => 3,
    'posts' => [
        [
            'ID' => 1,
            'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfort elderly be me. Walls began of child civil am heard hoped my. Satisfied pretended mr on do determine by.",
        ],
        [
            'ID' => 2,
            'post_content' => "Lorem ipsum dolor sit"
        ],
        [
            'ID' => 3,
            'post_content' => "Months on ye at by esteem desire warmth former. Sure that that way gave any fond now. His boy middleton sir nor engrossed affection excellent."
        ],
        [
            'ID' => 4,
            'post_content' => "Lorem ipsum dolor sit"
        ],
    ]
];

print_r($posts);

function getNonSimilarTexts($posts)
{
    $similarityPercentageArr = array();

    for ($i = 0; $i <= $posts['post_count']; $i++) {
        // $posts->the_post();
        $currentPost = $posts['posts'][$i];
        if (!is_null($currentPost['ID'])) {
            for ($y = 0; $y <= $posts['post_count']; $y++) {
                $comparePost = $posts['posts'][$y];
                if (!is_null($comparePost['ID'])) {
                    similar_text(strip_tags($currentPost['post_content']), strip_tags($comparePost['post_content']), $perc);
                    // similarity is 100 if self compare and more than 20 
                    if ($perc != 100 && $perc > 20) {
                        array_push($similarityPercentageArr, [$currentPost['ID'], $comparePost['ID'], $perc]);
                    }
                }
            }
        }
    }
    return $similarityPercentageArr;
}

$p = getNonSimilarTexts($posts);
print_r($p);

您可以立即进行过滤,将条件
if($perc!=100)
更改为
if($perc>20)
,这样您就只保留了想要删除的类似帖子。然后,您甚至可以完全跳过存储相似性,因为您已经有了要删除的post id数组的列表

所以,当您有这样的代码时:

if ($perc > 20) {
    $similarityPercentageArr[$currentPost['ID']][] = $comparePost['ID'];
}
$postsToRemove = [];
$postsToKeep = [];

foreach ($similarityPercentageArr as $postId => $similarPostIds) {
    // this post has already appeared as similar somewhere, so its similar posts have already been added 
    if (in_array($postId, $postsToRemove)) {
        continue;
    }

    $postsToKeep[] = $postId;
    $postsToRemove = array_merge($postsToRemove, $similarPostIds);
}
然后可以删除所有不需要的帖子,如下所示:

if ($perc > 20) {
    $similarityPercentageArr[$currentPost['ID']][] = $comparePost['ID'];
}
$postsToRemove = [];
$postsToKeep = [];

foreach ($similarityPercentageArr as $postId => $similarPostIds) {
    // this post has already appeared as similar somewhere, so its similar posts have already been added 
    if (in_array($postId, $postsToRemove)) {
        continue;
    }

    $postsToKeep[] = $postId;
    $postsToRemove = array_merge($postsToRemove, $similarPostIds);
}
现在您在
$poststokep
中有了原始的post id,在
$postsToRemove
中有了它的相似性id

我还将对代码进行一点优化,这样,当您知道您正在将帖子与自身进行比较时,就根本不会调用
类似的\u text
。因此,如果(!is_null($comparePost['ID']),
您将拥有
如果(!is_null($comparePost['ID']),&&$comparePost['ID']!=$currentPost['ID']),

if ($perc > 20) {
    $similarityPercentageArr[$currentPost['ID']][] = $comparePost['ID'];
}
$postsToRemove = [];
$postsToKeep = [];

foreach ($similarityPercentageArr as $postId => $similarPostIds) {
    // this post has already appeared as similar somewhere, so its similar posts have already been added 
    if (in_array($postId, $postsToRemove)) {
        continue;
    }

    $postsToKeep[] = $postId;
    $postsToRemove = array_merge($postsToRemove, $similarPostIds);
}