Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 如何在全文MySQL搜索中同时匹配HtmlLentified关键字和非HtmlLentified关键字_Php_Mysql_Encoding_Keyword_Html Entities - Fatal编程技术网

Php 如何在全文MySQL搜索中同时匹配HtmlLentified关键字和非HtmlLentified关键字

Php 如何在全文MySQL搜索中同时匹配HtmlLentified关键字和非HtmlLentified关键字,php,mysql,encoding,keyword,html-entities,Php,Mysql,Encoding,Keyword,Html Entities,我学会了在MySQL中尽可能原始地保存数据。下面是我如何在MySQL数据库中存储内容的示例: title (VARCHAR, 255) => Références content (TEXT) => <p>A paragraph about r&eacute;f&eacute;rences...</p> 您也可以在内容中保留“référence”不替换,因为它仍然是正确的HTML (只要HTML头指定的编码与数据库中的编码相同,假定为UTF-

我学会了在MySQL中尽可能原始地保存数据。下面是我如何在MySQL数据库中存储内容的示例:

title (VARCHAR, 255) => Références
content (TEXT) => <p>A paragraph about r&eacute;f&eacute;rences...</p>
您也可以在
内容中保留
“référence”
不替换,因为它仍然是正确的HTML

(只要HTML头指定的编码与数据库中的编码相同,假定为UTF-8:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >

)对于全文搜索,我相信您可以在表达式中使用
运算符。有关更多详细信息,请参阅。您可以在一个查询中使用它

在您的代码中,前面提到的内容与此类似:

$keyword = utf8_decode(mysql_real_escape_string($_POST['keyword'])); 
$keyword = '(' . $keyword . '*) OR (' . htmlentities($keyword) . '*)';
(有点脏,但我想你可以把它清理干净;)

我不知道在您的情况下,
$keyword
变量是否可以包含多个单词。所以我加了括号来澄清这一点

另一方面:我不确定布尔全文搜索是否会看到类似
ré;fé;法语
作为一个单词。但我想值得一试;)


编辑:我刚刚发现前面提到的问题可以通过在关键字周围使用双qoutes来解决,例如:
“ré;fé;rences”
。也就是说,如果有偏离路线的问题;)

作为一名在法国网站工作的法国开发人员,我也遇到了同样的问题,并找到了一种解决方法,同时仍然使用我认为您会发现有用的强大匹配功能。 你可以原谅可能的英语错误,我说我是法国人

我遇到的问题是,“HTMLlentified”这个词没有被识别为一个单词。例如,一旦在数据库中出现“HTMLTENTIFECT”,并且与函数匹配时将其视为(三个不同的单词)。

但正如上面所说的,引号之间的“htmlentified”单词(“)本身被视为一个表达式,并且被视为(一个表达式)

因此,解决方案似乎是,我需要保持表达式在引号之间的原样,并使用正则表达式在引号之间放置“htmlensified”单词

下面是我为解决这个问题而开发的代码,直到有人修复MySQL中的错误

    //At start, my variable is "htmlentified", so I need to replace <<&quot;>> with real double quotes <<">>
$Search_Text=str_replace("&quot;","\"",$Search_Text);

//I used the "ENT_COMPAT" option for html_entities, so I need to escape the single quotes
$Search_Text=str_replace("'","\\'",$Search_Text);

//I search if there are text between double quotes using "preg_match_all" with "PREG_OFFSET_CAPTURE" option to have starting offset for each of them
//Note that at the start of the regex I keep the eventual "+" or "-" or "(" or ")" just before or after quotes
if(preg_match_all('#([-|\+|(]{0,})(\"(.+)\")([)]{0,})#isU',$Search_Text,$Quoted_Blocks,PREG_OFFSET_CAPTURE))
{
    //I Initialise an array that will contain each distinct words and quoted expressions from the search text
    $List_Search_Terms=Array();

    //If the offset of the first quoted expression is not 0, it means that there is some words before that I will put in the array
    if($Quoted_Blocks[0][0][1]>0)
    {
        $Starting_Offset=0;
        $Length=$Quoted_Blocks[0][0][1];

        foreach(explode(" ",trim(substr($Search_Text,$Starting_Offset,$Length))) as $Index=>$Valeur)
        {
            array_push($List_Search_Terms,$Valeur);
        }
    }

    //Then I treat all quoted expressions and words between them that I will put in the array
    for($Index_Bloc=0;$Index_Bloc<count($Quoted_Blocks[0]);$Index_Bloc++)
    {
        //I put the quoted expression unmodified in my array
        array_push($List_Search_Terms,$Quoted_Blocks[0][$Index_Bloc][0]);

        //I verify if there's a following quoted expression for I can treat the words in between
        if(isset($Quoted_Blocks[0][$Index_Bloc+1]))
        {
            $Starting_Offset=$Quoted_Blocks[0][$Index_Bloc][1]+strlen($Quoted_Blocks[0][$Index_Bloc][0]);
            $Length=$Quoted_Blocks[0][$Index_Bloc+1][1]-$Starting_Offset;

            //I put the words in between in the array
            foreach(explode(" ",trim(substr($Search_Text,$Starting_Offset,$Length))) as $Index=>$Valeur)
            {
                array_push($List_Search_Terms,$Valeur);
            }
        }
    }

    //After treating all quoted expressions, I put in the array the words that can remain after the last quoted expression
    $Starting_Offset=$Quoted_Blocks[0][count($Quoted_Blocks[0])-1][1]+strlen($Quoted_Blocks[0][count($Quoted_Blocks[0])-1][0]);

    if($Starting_Offset<strlen($Search_Text))
    {
        foreach(explode(" ",trim(substr($Search_Text,$Starting_Offset))) as $Index=>$Valeur)
        {
            array_push($List_Search_Terms,$Valeur);
        }
    }
}
else
{
    //If there's no quoted expression in the search, I put all words in the array
    $List_Search_Terms=explode(" ",$Search_Text);
}

//Once I have all words and quoted expressions in the array, I can run through it to put "htmlentified" words (not expressions !) between double quotes
foreach($List_Search_Terms as $Index=>$Value)
{
    //I control that the line of the array doesn't contain double-quotes (which I won't touch) and if there are "&" and ";" characters that means an "htmlentified" word
    if(!preg_match('#"#',$Value) && preg_match('#&(.+)\;#isU',$Value))
    {
        //if both conditions match, I put quotes at the beginning and end
        //Note that as abovre, the regex will keep the eventual "+" or "-" or "(" or ")" just before or after the word
        $List_Search_Terms[$Index]=preg_replace('#^([-|\+|(]{0,})(.+)([)]{0,})$#is',"$1\"$2\"$3",$Value);
    }
}

//After that treatment, the array is converted in one single text that can be included in the mysql AGAINST condition
$Final_Search_Text=implode(" ",$List_Search_Terms);
//开始时,我的变量是“htmlentified”,所以我需要替换>
$Search\u Text=str\u replace(“”,“\”,$Search\u Text);
//我对html_实体使用了“ENT_COMPAT”选项,因此需要对单引号进行转义
$Search\u Text=str\u replace(“'”、“\\'”、$Search\u Text);
//我使用带有“preg\u OFFSET\u CAPTURE”选项的“preg\u match\u all”搜索双引号之间是否有文本,以使每个双引号都有起始偏移量
//请注意,在正则表达式的开头,我保留了引号之前或之后的最终“+”或“-”或“(“或”)”
如果(preg|u match|u all('.|([-|\+|(]{0,})(\“(.+)\”)([)]{0,})\#isU',$Search_Text,$Quoted|u块,preg|u偏移量_捕获))
{
//我初始化了一个数组,该数组将包含搜索文本中每个不同的单词和引用的表达式
$List_Search_Terms=Array();
//如果第一个带引号的表达式的偏移量不是0,这意味着在我将放入数组之前有一些单词
如果($Quoted_Blocks[0][0][1]>0)
{
$Starting_Offset=0;
$Length=$Quoted_块[0][0][1];
foreach(explode(“),trim(substr($Search\u Text,$start\u Offset,$Length)),作为$Index=>$Valeur)
{
数组推送($List\u Search\u Terms,$Valeur);
}
}
//然后我处理所有引用的表达式和它们之间的单词,我将把它们放入数组中
对于($Index\u Bloc=0;$Index\u Bloc$Value)
{
//我控制数组的行不包含双引号(我不想碰它),如果有“&”和“;”字符,则表示“htmlentified”字
if(!preg#u match('#“#,$Value)&&preg_match('#和(.+)\#isU',$Value))
{
//如果两个条件都匹配,我会在开头和结尾加引号
//注意,作为abovre,regex将保留单词前后的最终“+”或“-”或“(“或”)”
$List_Search_Terms[$Index]=preg_replace(“#^([-|\+|(]{0,})(.+)([)]{0,})$#is',“$1\”$2\“$3”,$Value);
}
}
//在处理之后,数组被转换成一个文本,该文本可以包含在mysql条件中
$Final_Search_Text=内爆(“,$List_Search_Terms”);

人们可能会对它进行一些优化,但事实上,它确实起到了作用!

谢谢。这意味着我不再使用
htmlentities
在页面上输出任何内容了?这不是有点不安全吗?只是出于好奇:我有了这个
AddDefaultCharset utf-8
AddCharset utf-8.html.css.js.xml.json.rss
在我的.htaccess中。另外,在我的HTML页面顶部我有
。这些东西之间有什么区别,如果有的话?在输出标题时,你仍然会使用htmlentities。在生成
内容时,我会htmlentity它,然后重新替换所需的字符(“白名单方法”:é;->。对于你的第二个问题:Apache也知道它是UTF-8,这很好。其含义与上面的元标记相同(我以前没有看到它的变体…),但对于跨浏览器的兼容性,你不能经常声明它…此外,PHP(默认字符集)、mysql驱动程序(mysql\U集\U字符集())以及mysql(在数据库、表和列级别)可以/应该设置为utf-8。这样,您可以避免写入utf8_decode(),因为它的存储方式与输出方式相同。感谢您对utf-8一致性的见解。就我的数据库而言,它设置为utf-8。我在PHP中的默认字符集设置为“无值”。我可能可以使用.htaccess来更改此设置。我不能100%确定您关于htmlentity ing
content
的评论。这不会弄乱HTML标记,如
pA段落…
htmlspecialchars在这种情况下是否有用?非常感谢,但这似乎会给我太多的误报。正如您所指出的布莱
    //At start, my variable is "htmlentified", so I need to replace <<&quot;>> with real double quotes <<">>
$Search_Text=str_replace("&quot;","\"",$Search_Text);

//I used the "ENT_COMPAT" option for html_entities, so I need to escape the single quotes
$Search_Text=str_replace("'","\\'",$Search_Text);

//I search if there are text between double quotes using "preg_match_all" with "PREG_OFFSET_CAPTURE" option to have starting offset for each of them
//Note that at the start of the regex I keep the eventual "+" or "-" or "(" or ")" just before or after quotes
if(preg_match_all('#([-|\+|(]{0,})(\"(.+)\")([)]{0,})#isU',$Search_Text,$Quoted_Blocks,PREG_OFFSET_CAPTURE))
{
    //I Initialise an array that will contain each distinct words and quoted expressions from the search text
    $List_Search_Terms=Array();

    //If the offset of the first quoted expression is not 0, it means that there is some words before that I will put in the array
    if($Quoted_Blocks[0][0][1]>0)
    {
        $Starting_Offset=0;
        $Length=$Quoted_Blocks[0][0][1];

        foreach(explode(" ",trim(substr($Search_Text,$Starting_Offset,$Length))) as $Index=>$Valeur)
        {
            array_push($List_Search_Terms,$Valeur);
        }
    }

    //Then I treat all quoted expressions and words between them that I will put in the array
    for($Index_Bloc=0;$Index_Bloc<count($Quoted_Blocks[0]);$Index_Bloc++)
    {
        //I put the quoted expression unmodified in my array
        array_push($List_Search_Terms,$Quoted_Blocks[0][$Index_Bloc][0]);

        //I verify if there's a following quoted expression for I can treat the words in between
        if(isset($Quoted_Blocks[0][$Index_Bloc+1]))
        {
            $Starting_Offset=$Quoted_Blocks[0][$Index_Bloc][1]+strlen($Quoted_Blocks[0][$Index_Bloc][0]);
            $Length=$Quoted_Blocks[0][$Index_Bloc+1][1]-$Starting_Offset;

            //I put the words in between in the array
            foreach(explode(" ",trim(substr($Search_Text,$Starting_Offset,$Length))) as $Index=>$Valeur)
            {
                array_push($List_Search_Terms,$Valeur);
            }
        }
    }

    //After treating all quoted expressions, I put in the array the words that can remain after the last quoted expression
    $Starting_Offset=$Quoted_Blocks[0][count($Quoted_Blocks[0])-1][1]+strlen($Quoted_Blocks[0][count($Quoted_Blocks[0])-1][0]);

    if($Starting_Offset<strlen($Search_Text))
    {
        foreach(explode(" ",trim(substr($Search_Text,$Starting_Offset))) as $Index=>$Valeur)
        {
            array_push($List_Search_Terms,$Valeur);
        }
    }
}
else
{
    //If there's no quoted expression in the search, I put all words in the array
    $List_Search_Terms=explode(" ",$Search_Text);
}

//Once I have all words and quoted expressions in the array, I can run through it to put "htmlentified" words (not expressions !) between double quotes
foreach($List_Search_Terms as $Index=>$Value)
{
    //I control that the line of the array doesn't contain double-quotes (which I won't touch) and if there are "&" and ";" characters that means an "htmlentified" word
    if(!preg_match('#"#',$Value) && preg_match('#&(.+)\;#isU',$Value))
    {
        //if both conditions match, I put quotes at the beginning and end
        //Note that as abovre, the regex will keep the eventual "+" or "-" or "(" or ")" just before or after the word
        $List_Search_Terms[$Index]=preg_replace('#^([-|\+|(]{0,})(.+)([)]{0,})$#is',"$1\"$2\"$3",$Value);
    }
}

//After that treatment, the array is converted in one single text that can be included in the mysql AGAINST condition
$Final_Search_Text=implode(" ",$List_Search_Terms);