Php 维基百科API中的非序列化值失败_Php_Serialization

Php 维基百科API中的非序列化值失败

php serialization

Php 维基百科API中的非序列化值失败,php,serialization,Php,Serialization,我试图从维基百科获取数据，但每次非序列化都失败了示例查询应从Honda Civic页面获取第20节： <?php exec("curl -s 'http://en.wikipedia.org/w/api.php?action=parse&format=php&page=Honda_Civic&prop=text&section=20'", $output); $value = ""; $first = true; foreach ($output as

我试图从维基百科获取数据，但每次非序列化都失败了

示例查询应从Honda Civic页面获取第20节：

<?php
exec("curl -s 'http://en.wikipedia.org/w/api.php?action=parse&format=php&page=Honda_Civic&prop=text&section=20'", $output);

$value = "";
$first = true;
foreach ($output as $line) {
   if ($first) {
      $first = false;
   } else {
      $value .= "\n";
   }

   $value .= $line;
}

print("~~~\n");
print($value);
print("\n~~~\n");
print(unserialize($value));
print("~~~\n");

是的，存在“引用错误”，但数据仍应非序列化。知道这是怎么回事吗

如果我在真实脚本中运行它（与这里给出的简化脚本相比），我会得到相同的输出，但也会得到以下潜在有用的信息：

unserialize(): Error at offset 1583 of 1587 bytes

您正在通过curl和shell传递数据，它会以破坏数据的方式进行修改

取而代之的是，以一种不会破坏数据的方式获取数据，您应该会没事的

示例代码：

$url = 'http://en.wikipedia.org/w/api.php?action=parse&format=php&page=Honda_Civic&prop=text&section=20';

$buffer = file_get_contents($url);

$test = unserialize($buffer);

var_dump($test);

结果:

array(1) {
  'parse' =>
  array(2) {
    'title' =>
    string(11) "Honda Civic"
    'text' =>
    array(1) {
      '*' =>
      string(1476) "<h4><span class="editsection">[<a href="/w/index.php?title=Honda_Civic&amp;action=edit&amp;section=1" title="Edit section: WTCC">edit</a>]</span> <span class="mw-headline" id="WTCC">WTCC</span></h4>\n<p>Honda announced to enter the 2012 <a href="/wiki/World_Touring_Car_Championship" title="World Touring Car Championship">World Touring Car Championship</a> (WTCC) with a racer built on the 2012 Euro Civic 5 door hatchback. The car is powered by a 1.6-liter turbocharged engine, developed by Honda R&amp;D, and "...
    }
  }
}

什么意思？当PHP取消序列化一个字符串时，它会根据它自己的格式对其进行解析。这种格式使得它可以在不同的偏移量下使用不同的东西。例如，字符串被括在双引号中，并以字节为前缀。因此，解析器根据给定的长度移动到字符串的末尾，并检查在计算的偏移量处是否找到了

“

双引号。在您的情况下，可能正好是偏移量1583的情况，但没有找到它

当不同的字符编码具有不同的字符串字节长度时，这很可能就是问题所在。例如，您在问题末尾的部分：

Preprocessor generated node count: 1599/1500000
Post‐expand include size: 3103/2048000 bytes
Template argument size: 1880/2048000 bytes

Post-expand

中的连字符实际上是。它在序列化字符串中占用三个字节

但是，如果通过shell破坏输出，则可以将其转换为shell中使用的不同编码，因此破折号的字节长度仅为一个字节，因为它被转换为消耗一个字节的

（减号，ASCII连字符）

在另一个系统上，STDIO可能不会破坏编码，因为它在UTF-8中，因此不会中断

另一种解决方法是告诉curl命令行工具写入临时文件，然后使用

file\u get\u contents

加载该临时文件，尝试使用

print\u r

或

var\u dump

而不是

print

。

unserialize(): Error at offset 1583 of 1587 bytes

Preprocessor generated node count: 1599/1500000
Post‐expand include size: 3103/2048000 bytes
Template argument size: 1880/2048000 bytes