PHP防止双重清除url(改进?)

PHP防止双重清除url(改进?),php,recursion,clean-urls,Php,Recursion,Clean Urls,对于工作中的客户,我们已经建立了一个网站。该网站有一个提供页面,其中可能包含相同类型/构建的变体,因此他们遇到了双重清除URL的问题 刚才我编写了一个函数,通过在URL中添加一个数字来防止这种情况发生。如果这个干净的url也存在的话,那么它会累加 例如 domain.nl/product/machine domain.nl/product/machine-1 domain.nl/product/machine-2 更新!返回$clean_url;关于递归和返回 我写的函数工作得很好,但我想知道我

对于工作中的客户,我们已经建立了一个网站。该网站有一个提供页面,其中可能包含相同类型/构建的变体,因此他们遇到了双重清除URL的问题

刚才我编写了一个函数,通过在URL中添加一个数字来防止这种情况发生。如果这个干净的url也存在的话,那么它会累加

例如

domain.nl/product/machine

domain.nl/product/machine-1

domain.nl/product/machine-2

更新!返回$clean_url;关于递归和返回

我写的函数工作得很好,但我想知道我是否采取了正确的方法,是否可以改进。代码如下:

public function prevent_double_cleanurl($cleanurl)
{

    // makes sure it doesnt check against itself
            if($this->ID!=NULL) $and = " AND product_ID <> ".$this->ID;

    $sql = "SELECT product_ID, titel_url FROM " . $this->_table . " WHERE titel_url='".$cleanurl."' " . $and. " LIMIT 1";

    $result = $this->query($sql);

            // if a matching url is found
    if(!empty($result))
    {
        $url_parts = explode("-", $result[0]['titel_url']);
        $last_part = end($url_parts);

        // maximum of 2 digits
        if((int)$last_part && strlen($last_part)<3)
        {
            // if a 1 or 2 digit number is found - add to it
                            array_pop($url_parts);
            $cleanurl = implode("-", $url_parts);

            (int)$last_part++;
        }
        else
        {
            // add a suffix starting at 1
                            $last_part='1';
        }
                    // recursive check
        $cleanurl = $this->prevent_double_cleanurl($cleanurl.'-'.$last_part);
    }

    return $cleanurl; 
}

根据一个干净的url被多次使用的可能性,您的方法可能不是最好的选择。假设有foo-to-foo-10,你会给数据库打10次电话

您似乎也没有清理放入SQL查询中的数据。你是在用它还是它的mysqli,PDO,随便什么兄弟

修订守则:

public function prevent_double_cleanurl($cleanurl) {
    $cleanurl_pattern = '#^(?<base>.*?)(-(?<num>\d+))?$#S';

    if (preg_match($cleanurl_pattern, $base, $matches)) {
        $base = $matches['base'];
        $num = $matches['num'] ? $matches['num'] : 0;
    } else {
        $base = $cleanurl;
        $num = 0;
    }

    // makes sure it doesnt check against itself
    if ($this->ID != null) {
        $and = " AND product_ID <> " . $this->ID;
    }

    $sql = "SELECT product_ID, titel_url FROM " . $this->_table . " WHERE titel_url LIKE '" . $base . "-%' LIMIT 1";
    $result = $this->query($sql);

    foreach ($result as $row) {
        if ($this->ID && $row['product_ID'] == $this->ID) {
            // the given cleanurl already has an ID,
            // so we better not touch it
            return $cleanurl;
        }

        if (preg_match($cleanurl_pattern, $row['titel_url'], $matches)) {
            $_base = $matches['base'];
            $_num = $matches['num'] ? $matches['num'] : 0;
        } else {
            $_base = $row['titel_url'];
            $_num = 0;
        }

        if ($base != $_base) {
            // make sure we're not accidentally comparing "foo-123" and "foo-bar-123"
            continue;
        }

        if ($_num > $num) {
            $num = $_num;
        }
    }

    // next free number
    $num++;
    return $base . '-' . $num;
}

我不知道清洁URL的可能值。上次我做了这样的事情,我的基地可能看起来像一些文章修订版5。5是实际项目符号的一部分,而不是重复索引。为了区分它们并让类似的东西过滤掉误报,我让干净的URL看起来像$base-$num。双破折号只能出现在基本索引和重复索引之间,使事情变得更简单…

我没有办法测试这个,所以由你来做,但我会这样做。我在里面放了大量的注释来解释我的推理和代码流程

基本上,递归是不必要的,它将导致比您需要的更多的数据库查询

<?
public function prevent_double_cleanurl($cleanurl)
{
    $sql = sprintf("SELECT product_ID, titel_url FROM %s WHERE titel_url LIKE '%s%%'", 
        $this->_table, $cleanurl);
    if($this->ID != NULL){ $sql.= sprintf(" AND product_ID <> %d", $this->ID); }

    $results = $this->query($sql);

    $suffix = 0;
    $baseurl = true;
    foreach($results as $row)
    {
        // Consider the case when we get to the "first" row added to the db:
        //  For example: $row['titel_url'] == $cleanurl == 'domain.nl/product/machine'
        if($row['title_url'] == $cleanurl)
        {
            $baseurl = false;   // The $cleanurl is already in the db, "this" is not a base URL
            continue;           // Continue with the next iteration of the foreach loop
        }

        // This could be done using regex, but if this works its fine.
        // Make sure to test for the case when you have both of the following pages in your db:
        //
        //  some-hyphenated-page
        //  some-hyphenated-page-name
        //
        // You don't want the counters to get mixed up
        $url_parts = explode("-", $row['titel_url']);
        $last_part = array_pop($url_parts);
        $cleanrow = implode("-", $url_parts);

        // To get into this block, three things need to be true
        //  1. $last_part must be a numeric string (PHP Duck Typing bleh)
        //  2. When represented as a string, $last_part must not be longer than 2 digits
        //  3. The string passed to this function must match the string resulting from the (n-1) 
        //      leading parts of the result of exploding the table row
        if((is_numeric($last_part)) && (strlen($last_part)<=2) && ($cleanrow == $cleanurl))
        {
            $baseurl = false;                           // If there are records in the database, the 
                                                        //  passed $cleanurl isn't the first, so it 
                                                        //  will need a suffix
            $suffix = max($suffix, (int)$last_part);    // After this foreach loop is done, $suffix 
                                                        //  will contain the highest suffix in the 
                                                        //  database we'll need to add 1 to this to 
                                                        //  get the result url
        }
    }

    // If $baseurl is still true, then we never got into the 3-condition block above, so we never 
    //  a matching record in the database -> return the cleanurl that was passed here, no need
    //  to add a suffix
    if($baseurl)
    {
        return $cleanurl;
    }
    // At least one database record exists, so we need to add a suffix.  The suffix we add will be
    //  the higgest we found in the database plus 1.
    else
    {
        return sprintf("%s-%d", $cleanurl, ($suffix + 1));
    }
}
我的解决方案利用SQL通配符%将查询数从n减少到1

确保您确保我在第14-20行中描述的问题案例按预期工作。机器名中的连字符或其他字符可能会导致意外情况

我还使用sprintf格式化查询。确保对作为字符串传递的任何字符串进行清理,例如$cleanurl


正如@rodneyrehm所指出的,PHP非常灵活地处理它所认为的数字字符串。你可能会考虑切换,看看它是如何工作的。

你确定它工作正常吗?您返回的$titel_url在哪里?您完全正确。我在项目中更新了它,遇到了同样的问题。我也更新了上面的代码。谢谢PS,我现在每天在我的文本编辑器中使用100列,很抱歉出现溢出,stackoverflow将代码框限制在85左右。is_int$last_部分在这里将始终为false$最后一部分是explode结果数组的一个元素,它们都是字符串。您可能正在查找is_numeric,但应该注意is_numeric3e3==true。所以你可能想用Regex做这个你是对的,我读便条读得太快了。PHP的鸭子式打字让我发疯。PS 0xFF==255=>TRUE在PHP中与我的建议非常接近,只是因为两个原因它不起作用。1 MAX将按字典顺序排序,而不是按数字排序,并且由于他不执行零填充,因此对于>9个条目,它将返回错误的结果。2 foo-%将匹配已重写的条目,但由于连字符,它将与原始未重写的基本URL不匹配。我修复了我的示例,使其不使用MAX[duh,我自己也可能想到…]和foo-%的假阳性检查。