Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用PHP消除字符串中的HTML转义序列_Php_String - Fatal编程技术网

用PHP消除字符串中的HTML转义序列

用PHP消除字符串中的HTML转义序列,php,string,Php,String,我正在编写一个php脚本,从wowhead中提取任务数据,特别是任务的开始和结束,无论是物品还是npc,以及任务的id或名称。这是整个脚本的相关部分,其余部分涉及数据库插入。如果有人感兴趣,这是我提出的完整代码片段。此外,鉴于这将运行约15000次,这是获取/存储数据的最佳方法吗 <?php $quests = array(); //$questlimit = 14987; $questlimit = 5; $currentquest = 1; $questsprocessed = 0;

我正在编写一个php脚本,从wowhead中提取任务数据,特别是任务的开始和结束,无论是物品还是npc,以及任务的id或名称。这是整个脚本的相关部分,其余部分涉及数据库插入。如果有人感兴趣,这是我提出的完整代码片段。此外,鉴于这将运行约15000次,这是获取/存储数据的最佳方法吗

<?php

$quests = array();
//$questlimit = 14987;
$questlimit = 5;
$currentquest = 1;
$questsprocessed = 0;
while($questsprocessed != $questlimit)
{
echo "<br>";
echo "  Start of iteration: ".$questsprocessed."  ";
echo "<br>";
echo "  Attempting to process quest: ".$currentquest."  ";
echo "<br>";

$quests[$currentquest] = array();
$baseurl = 'http://wowhead.com/quest=';
$fullurl = $baseurl.$currentquest;

$data = drupal_http_request($fullurl);

$queststartloc1 = strpos($data->data, 'quest_start'); 
$queststartloc2 = strpos($data->data, 'quest_end');

if($queststartloc1==false)
{$currentquest++; echo "No data for this quest"; echo "<br>"; continue;}


$questendloc1 = strpos($data->data, 'quest_end');
$questendloc2 = strpos($data->data, 'x5DDifficulty');

$startcaptureLength = $queststartloc2 - $queststartloc1;
$endcaptureLength = $questendloc2 - $questendloc1;


$quest_start_raw = substr($data->data,$queststartloc1, $startcaptureLength);
$quest_end_raw = substr($data->data, $questendloc1, $endcaptureLength);

$startDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_start_raw);
$endDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_end_raw);
$quests[$currentquest]['Start'] = array();
$quests[$currentquest]['End'] = array();

if(strstr($startDecoded, 'npc'))
  {
   $quests[$currentquest]['Start']['Type'] = "npc";
  preg_match('~npc=(\d+)~', $startDecoded, $startmatch);
  }
else
{
  $quests[$currentquest]['Start']['Type'] = "item";
  preg_match('~item=(\d+)~', $startDecoded, $startmatch);
}


$quests[$currentquest]['Start']['ID'] = $startmatch[1];


if(strstr($endDecoded, 'npc'))
  {
   $quests[$currentquest]['End']['Type'] = "npc";
  preg_match('~npc=(\d+)~', $endDecoded, $endmatch);
  }
else
{
  $quests[$currentquest]['End']['Type'] = "item";
  preg_match('~item=(\d+)~', $endDecoded, $endmatch);
}


$quests[$currentquest]['End']['ID'] = $endmatch[1];

//var_dump($quests[$currentquest]);

echo "  End of iteration: ".$questsprocessed."  ";
echo "<br>";
echo "  Processed quest: ".$currentquest."  ";
echo "<br>";
$currentquest++;
$questsprocessed++;

}
?>

这些被称为“转义序列”。通常,它们用于表示不可打印的字符,但可以对任何字符进行编码。在php中,您可以像这样解码它们:

$text = '
quest_start\\x5DStart\\x3A\\x20\\x5Bitem\\x3D16305\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5D\\x5Bicon\\x20name\\x3Dquest_end\\x5DEnd\\x3A\\x20\\x5Burl\\x3D\\x2Fnpc\\x3D12696\\x5DSenani\\x20Thunderheart\\x5B\\x2Furl\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5DNot\\x20sharable\\x5B\\x2Fli\\x5D\\x5Bli
';

$decoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $text);
这将为您提供一个类似以下内容的字符串:

 quest_start]Start: [item=16305][/icon][/li][li][icon name=quest_end]End: [url=/npc=12696]Senani Thunderheart[/url][/icon][/li][li]Not sharable[/li][li
(显然是某种BB代码)。要删除所有BBS代码,需要更换一个:

$clean = preg_replace('~(\[.+?\])+~', ' ', $decoded);

非常感谢,这将使我在完成这个脚本的过程中走得很好。我唯一的问题是,清洁似乎清除了我的项目编号。从解码的消息中,我需要开始:item=16305结束:npc=12696。我在帖子中提到了npc的名字,因为我没有看到npc的ID也在里面。目前,ID对我来说比名字有用得多。我在谷歌搜索中看到一篇帖子,他们将一个字符串转换成一个表,在类似名称的位置拆分:john domain:example.com id:123,但我似乎再也找不到它了。@user28187:在清理之前,你可以使用
preg_match('~npc=(\d+),$decoded,$match)提取数字。
我添加了一些检查空数据的功能(quest不存在)和基于数组类型id的数据库插入。这个脚本的唯一问题是它大约每小时都会说“连接在30秒后超时”,我该如何处理这个错误并重新启动循环?