Php 如何修复因字节计数长度不正确而损坏的序列化字符串?

Php 如何修复因字节计数长度不正确而损坏的序列化字符串?,php,mysql,serialization,content-management-system,Php,Mysql,Serialization,Content Management System,我使用的是带有图像上传插件的Hotaru CMS,如果我试图将图像附加到帖子中,我会出现此错误,否则不会出现错误: unserialize()[function.unserialize]:偏移量处的错误 有问题的代码(错误点与**一致): 表中的数据,注意末尾有图像信息,我不是PHP专家,所以我想知道你们会怎么想 tempdata_值: a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:

我使用的是带有图像上传插件的Hotaru CMS,如果我试图将图像附加到帖子中,我会出现此错误,否则不会出现错误:

unserialize()[function.unserialize]:偏移量处的错误

有问题的代码(错误点与**一致):

表中的数据,注意末尾有图像信息,我不是PHP专家,所以我想知道你们会怎么想

tempdata_值:

a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}
编辑:我想我已经找到了序列化位

/**
     * Save submission step data
     *
     * @return bool
     */
    public function saveSubmitData($h)
    {
        // delete everything in this table older than 30 minutes:
        $this->deleteTempData($h->db);

        $sid = preg_replace('/[^a-z0-9]+/i', '', session_id());
        $key = md5(microtime() . $sid . rand());
        $sql = "INSERT INTO " . TABLE_TEMPDATA . " (tempdata_key, tempdata_value, tempdata_updateby) VALUES (%s,%s, %d)";
        $h->db->query($h->db->prepare($sql, $key, serialize($h->vars['submitted_data']), $h->currentUser->id));
        return $key;
    }
unserialize()

快速修复

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
您可以做的是重新计算序列化数组中元素的长度

您可以选择当前的序列化数据

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
不重新计算的示例

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
输出

Notice: unserialize() [function.unserialize]: Error at offset 337 of 338 bytes
重新计算

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
输出

array
  'submit_editorial' => boolean false
  'submit_orig_url' => string 'www.bbc.co.uk' (length=13)
  'submit_title' => string 'No title found' (length=14)
  'submit_content' => string 'dnfsdkfjdfdf' (length=12)
  'submit_category' => int 2
  'submit_tags' => string 'bbc' (length=3)
  'submit_id' => boolean false
  'submit_subscribe' => int 0
  'submit_comments' => string 'open' (length=4)
  'image' => string 'C:fakepath100.jpg' (length=17)
Diffrence 9 != 7
    -> ORD number 57 != 55
    -> Line Number = 315
    -> Section Data1  = pen";s:5:"image";s:19:"C:fakepath100.jpg
    -> Section Data2  = pen";s:5:"image";s:17:"C:fakepath100.jpg
                                            ^------- The Error (Element Length)
array(2) {
  [0] =>
  string(16) "as:45:"d";
Is \n"
  [1] =>
  string(18) "as:45:"d";
Is \r\n"
}
建议。。我

而不是使用这种快速修复。。。我会建议你更新这个问题

  • 如何序列化数据

  • 你是如何保存它的

================================================================================================================

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
错误

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
该错误是由于使用双引号
而不是单引号
而产生的,这就是为什么
C:\fakepath\100.png
被转换为
C:fakepath100.jpg

修复错误

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
您需要从中更改
$h->vars['submitted_data']
(请注意,
'

替换

 $h->vars['submitted_data']['image'] = "C:\fakepath\100.png" ;

附加过滤器

您还可以在调用serialize之前添加此简单筛选器

function satitize(&$value, $key)
{
    $value = addslashes($value);
}

array_walk($h->vars['submitted_data'], "satitize");
如果有UTF字符,也可以运行

 $h->vars['submitted_data'] = array_map("utf8_encode",$h->vars['submitted_data']);
如何在将来的序列化数据中检测问题

$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
var_dump(unserialize($data));
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
  findSerializeError ( $data1 ) ;
header('Content-Type: text/html; charset=utf-8');
输出

array
  'submit_editorial' => boolean false
  'submit_orig_url' => string 'www.bbc.co.uk' (length=13)
  'submit_title' => string 'No title found' (length=14)
  'submit_content' => string 'dnfsdkfjdfdf' (length=12)
  'submit_category' => int 2
  'submit_tags' => string 'bbc' (length=3)
  'submit_id' => boolean false
  'submit_subscribe' => int 0
  'submit_comments' => string 'open' (length=4)
  'image' => string 'C:fakepath100.jpg' (length=17)
Diffrence 9 != 7
    -> ORD number 57 != 55
    -> Line Number = 315
    -> Section Data1  = pen";s:5:"image";s:19:"C:fakepath100.jpg
    -> Section Data2  = pen";s:5:"image";s:17:"C:fakepath100.jpg
                                            ^------- The Error (Element Length)
array(2) {
  [0] =>
  string(16) "as:45:"d";
Is \n"
  [1] =>
  string(18) "as:45:"d";
Is \r\n"
}
findSerializeError
函数

function findSerializeError($data1) {
    echo "<pre>";
    $data2 = preg_replace ( '!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'",$data1 );
    $max = (strlen ( $data1 ) > strlen ( $data2 )) ? strlen ( $data1 ) : strlen ( $data2 );

    echo $data1 . PHP_EOL;
    echo $data2 . PHP_EOL;

    for($i = 0; $i < $max; $i ++) {

        if (@$data1 {$i} !== @$data2 {$i}) {

            echo "Diffrence ", @$data1 {$i}, " != ", @$data2 {$i}, PHP_EOL;
            echo "\t-> ORD number ", ord ( @$data1 {$i} ), " != ", ord ( @$data2 {$i} ), PHP_EOL;
            echo "\t-> Line Number = $i" . PHP_EOL;

            $start = ($i - 20);
            $start = ($start < 0) ? 0 : $start;
            $length = 40;

            $point = $max - $i;
            if ($point < 20) {
                $rlength = 1;
                $rpoint = - $point;
            } else {
                $rpoint = $length - 20;
                $rlength = 1;
            }

            echo "\t-> Section Data1  = ", substr_replace ( substr ( $data1, $start, $length ), "<b style=\"color:green\">{$data1 {$i}}</b>", $rpoint, $rlength ), PHP_EOL;
            echo "\t-> Section Data2  = ", substr_replace ( substr ( $data2, $start, $length ), "<b style=\"color:red\">{$data2 {$i}}</b>", $rpoint, $rlength ), PHP_EOL;
        }

    }

}

导致此错误的原因是您的字符集错误

在打开标记后设置字符集:

mysql_query("SET NAMES 'utf8'");
并在数据库中设置字符集utf8:

$fixed_data = preg_replace_callback ( '!s:(\d+):"(.*?)";!', function($match) {      
    return ($match[1] == strlen($match[2])) ? $match[0] : 's:' . strlen($match[2]) . ':"' . $match[2] . '";';
},$bad_data );

还有另一个原因是
unserialize()
失败,因为您未正确地将序列化数据放入数据库,请参见此处。由于
serialize()
返回二进制数据,php变量不关心编码方法,因此将其放入文本中,VARCHAR()将导致此错误


解决方案:将序列化数据存储到表中的BLOB中

我没有足够的声誉发表评论,因此我希望使用上述“正确”答案的人能看到这一点:

由于PHP5.5,preg_replace()中的/e修饰符已被完全弃用,上面的preg_匹配将出错。php文档建议在其位置使用preg_match_回调

请找到以下解决方案作为上述预匹配的替代方案

$old_err=error_reporting(); 
error_reporting($old_err & ~E_NOTICE);
$object = unserialize($serialized_data);
error_reporting($old_err);

您必须将排序规则类型更改为
utf8\u unicode\u ci
,问题将得到解决。

表示它应返回false并设置E\u通知

但由于您出现错误,因此错误报告设置为由E_通知触发

这里有一个修复程序,允许您检测
unserialize

$string=base64_encode(serialize($obj));
unserialize(base64_decode($string));

您可能想考虑使用Base64编码/解码< /P>

function repairSerializeString($value)
{

    $regex = '/s:([0-9]+):"(.*?)"/';

    return preg_replace_callback(
        $regex, function($match) {
            return "s:".mb_strlen($match[2]).":\"".$match[2]."\""; 
        },
        $value
    );
}

在我的例子中,我将序列化数据存储在MySQL DB的
BLOB
字段中,该字段显然不够大,无法包含整个值并将其截断。这样的字符串显然无法取消序列化。
将该字段转换为
MEDIUMBLOB
后,问题就消失了。
此外,可能需要将表格选项
行格式
切换为
动态
压缩

此问题的另一个原因可能是“有效负载”会话表的列类型。若会话中有大量数据,一个文本列是不够的。您将需要MEDIUMTEXT甚至LONGTEXT。

您可以使用以下函数修复损坏的序列化字符串,并使用多字节字符处理

$badData = 'a:2:{i:0;s:16:"as:45:"d";
Is \n";i:1;s:19:"as:45:"d";
Is \r\n";}';
无法使用建议的正则表达式修复损坏的序列化字符串:

$data = preg_replace_callback(
    '/(?<=^|\{|;)s:(\d+):\"(.*?)\";(?=[asbdiO]\:\d|N;|\}|$)/s',
    function($m){
        return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
    },
    $badData
);

var_dump(@unserialize($data));
array(2) {
  [0] =>
  string(17) "as:45:"d";
Is \n"
  [1] =>
  string(19) "as:45:"d";
Is \r\n"
}
您可以使用以下正则表达式修复损坏的序列化字符串:

$data = preg_replace_callback(
    '/(?<=^|\{|;)s:(\d+):\"(.*?)\";(?=[asbdiO]\:\d|N;|\}|$)/s',
    function($m){
        return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
    },
    $badData
);

var_dump(@unserialize($data));
array(2) {
  [0] =>
  string(17) "as:45:"d";
Is \n"
  [1] =>
  string(19) "as:45:"d";
Is \r\n"
}


快速修复

重新计算序列化数组中元素的长度-但不要使用(preg\u replace)它已被弃用-最好使用preg\u replace\u回调:

编辑:新版本现在不仅仅是错误的长度,它还修复了换行符,并使用aczent计算正确的字符数(感谢)


在这个页面上尝试了一些没有成功的事情之后,我查看了页面源代码,并注意到序列化字符串中的所有引号都已被html实体替换。 解码这些实体有助于避免很多麻烦:

public function unserializeKeySkills($string) {
    $output = array();
    $string = trim(preg_replace('/\s\s+/', ' ',$string));
    $string = preg_replace_callback('!s:(\d+):"(.*?)";!', function($m) { return 's:'.strlen($m[2]).':"'.$m[2].'";'; }, utf8_encode( trim(preg_replace('/\s\s+/', ' ',$string)) ));
    try {
        $output =  unserialize($string);
    } catch (\Exception $e) {
        \Log::error("unserialize Data : " .print_r($string,true));
    }
    return $output;
}

这个问题中的损坏被隔离到序列化字符串末尾的单个子字符串中,可能是由懒散地想要更新
图像
文件名的人手动替换的。这一事实在我下面使用OP发布的数据的演示链接中很明显——简而言之,
C:fakepath100.jpg
的长度不是
19
,应该是
17

由于序列化字符串损坏被限制为不正确的字节/字符计数,因此以下内容可以很好地使用正确的字节计数值更新损坏的字符串

以下基于正则表达式的替换只会在纠正字节计数方面有效,仅此而已。 看起来以前的许多帖子只是从别人那里复制粘贴正则表达式模式如果替换中不使用可能损坏的字节计数,则没有理由捕获它。此外,添加
s
模式修饰符对于字符串值是合理的