用PHP消除字符串中的HTML转义序列_Php_String

用PHP消除字符串中的HTML转义序列

php string

用PHP消除字符串中的HTML转义序列,php,string,Php,String,我正在编写一个php脚本，从wowhead中提取任务数据，特别是任务的开始和结束，无论是物品还是npc，以及任务的id或名称。这是整个脚本的相关部分，其余部分涉及数据库插入。如果有人感兴趣，这是我提出的完整代码片段。此外，鉴于这将运行约15000次，这是获取/存储数据的最佳方法吗 <?php $quests = array(); //$questlimit = 14987; $questlimit = 5; $currentquest = 1; $questsprocessed = 0;

我正在编写一个php脚本，从wowhead中提取任务数据，特别是任务的开始和结束，无论是物品还是npc，以及任务的id或名称。这是整个脚本的相关部分，其余部分涉及数据库插入。如果有人感兴趣，这是我提出的完整代码片段。此外，鉴于这将运行约15000次，这是获取/存储数据的最佳方法吗

<?php

$quests = array();
//$questlimit = 14987;
$questlimit = 5;
$currentquest = 1;
$questsprocessed = 0;
while($questsprocessed != $questlimit)
{
echo "<br>";
echo "  Start of iteration: ".$questsprocessed."  ";
echo "<br>";
echo "  Attempting to process quest: ".$currentquest."  ";
echo "<br>";

$quests[$currentquest] = array();
$baseurl = 'http://wowhead.com/quest=';
$fullurl = $baseurl.$currentquest;

$data = drupal_http_request($fullurl);

$queststartloc1 = strpos($data->data, 'quest_start'); 
$queststartloc2 = strpos($data->data, 'quest_end');

if($queststartloc1==false)
{$currentquest++; echo "No data for this quest"; echo "<br>"; continue;}


$questendloc1 = strpos($data->data, 'quest_end');
$questendloc2 = strpos($data->data, 'x5DDifficulty');

$startcaptureLength = $queststartloc2 - $queststartloc1;
$endcaptureLength = $questendloc2 - $questendloc1;


$quest_start_raw = substr($data->data,$queststartloc1, $startcaptureLength);
$quest_end_raw = substr($data->data, $questendloc1, $endcaptureLength);

$startDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_start_raw);
$endDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_end_raw);
$quests[$currentquest]['Start'] = array();
$quests[$currentquest]['End'] = array();

if(strstr($startDecoded, 'npc'))
  {
   $quests[$currentquest]['Start']['Type'] = "npc";
  preg_match('~npc=(\d+)~', $startDecoded, $startmatch);
  }
else
{
  $quests[$currentquest]['Start']['Type'] = "item";
  preg_match('~item=(\d+)~', $startDecoded, $startmatch);
}


$quests[$currentquest]['Start']['ID'] = $startmatch[1];


if(strstr($endDecoded, 'npc'))
  {
   $quests[$currentquest]['End']['Type'] = "npc";
  preg_match('~npc=(\d+)~', $endDecoded, $endmatch);
  }
else
{
  $quests[$currentquest]['End']['Type'] = "item";
  preg_match('~item=(\d+)~', $endDecoded, $endmatch);
}


$quests[$currentquest]['End']['ID'] = $endmatch[1];

//var_dump($quests[$currentquest]);

echo "  End of iteration: ".$questsprocessed."  ";
echo "<br>";
echo "  Processed quest: ".$currentquest."  ";
echo "<br>";
$currentquest++;
$questsprocessed++;

}
?>

这些被称为“转义序列”。通常，它们用于表示不可打印的字符，但可以对任何字符进行编码。在php中，您可以像这样解码它们：

$text = '
quest_start\\x5DStart\\x3A\\x20\\x5Bitem\\x3D16305\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5D\\x5Bicon\\x20name\\x3Dquest_end\\x5DEnd\\x3A\\x20\\x5Burl\\x3D\\x2Fnpc\\x3D12696\\x5DSenani\\x20Thunderheart\\x5B\\x2Furl\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5DNot\\x20sharable\\x5B\\x2Fli\\x5D\\x5Bli
';

$decoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $text);

这将为您提供一个类似以下内容的字符串：

 quest_start]Start: [item=16305][/icon][/li][li][icon name=quest_end]End: [url=/npc=12696]Senani Thunderheart[/url][/icon][/li][li]Not sharable[/li][li

（显然是某种BB代码）。要删除所有BBS代码，需要更换一个：

$clean = preg_replace('~(\[.+?\])+~', ' ', $decoded);

非常感谢，这将使我在完成这个脚本的过程中走得很好。我唯一的问题是，清洁似乎清除了我的项目编号。从解码的消息中，我需要开始：item=16305结束：npc=12696。我在帖子中提到了npc的名字，因为我没有看到npc的ID也在里面。目前，ID对我来说比名字有用得多。我在谷歌搜索中看到一篇帖子，他们将一个字符串转换成一个表，在类似名称的位置拆分：john domain:example.com id:123，但我似乎再也找不到它了。@user28187:在清理之前，你可以使用

preg_match（'~npc=（\d+），$decoded，$match）提取数字。

我添加了一些检查空数据的功能（quest不存在）和基于数组类型id的数据库插入。这个脚本的唯一问题是它大约每小时都会说“连接在30秒后超时”，我该如何处理这个错误并重新启动循环？