在PHP中查找两个字符串的匹配部分
我正在寻找一种在PHP中查找两个字符串的匹配部分的简单方法(特别是在URI上下文中)在PHP中查找两个字符串的匹配部分,php,uri,string-matching,Php,Uri,String Matching,我正在寻找一种在PHP中查找两个字符串的匹配部分的简单方法(特别是在URI上下文中) 例如,考虑两个字符串: 及 /~machinehost/deployment\u文件夹/users/bob/settings 我需要的是从第二个字符串中切掉这两个字符串的匹配部分,结果是: 用户/bob/设置 在将第一个字符串作为前缀追加之前,形成一个绝对URI (在PHP中)有没有比较两个任意字符串以匹配其中的子字符串的简单方法 编辑:如前所述,我指的是两个字符串共有的最长匹配子字符串我不确定是否理解您的完
例如,考虑两个字符串:
及 /~machinehost/deployment\u文件夹/users/bob/settings 我需要的是从第二个字符串中切掉这两个字符串的匹配部分,结果是: 用户/bob/设置 在将第一个字符串作为前缀追加之前,形成一个绝对URI (在PHP中)有没有比较两个任意字符串以匹配其中的子字符串的简单方法编辑:如前所述,我指的是两个字符串共有的最长匹配子字符串我不确定是否理解您的完整请求,但我的想法是: A是您的URL,B是您的“/~machinehost/deployment\u文件夹/users/bob/settings”
- 在A中搜索B->得到索引i(其中i是A中B的第一个/的位置)
- 设l=长度(A)
- 您需要将B从(l-i)剪切到长度(B),以获取B的最后一部分(/users/bob/settings)
$pattern = "$B(.*?)"
$res = array();
preg_match_all($pattern, $A, $res);
编辑:我认为你最后的评论使我的回答无效。但是你想要的是找到子字符串。因此,您可以先从一个繁重的算法开始,尝试在a中为{2,length(B)}中的i找到B[1:i],然后使用一些填充物。假设您的字符串分别是
$a
和$B
,您可以使用:
$a = 'http://2.2.2.2/~machinehost/deployment_folder/';
$b = '/~machinehost/deployment_folder/users/bob/settings';
$len_a = strlen($a);
$len_b = strlen($b);
for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)
if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))
break;
$result = $a.substr($b, $len_b - $p);
echo $result;
$a=http://2.2.2.2/~machinehost/deployment_folder/';
$b='/~machinehost/deployment_folder/users/bob/settings';
$len_a=strlen($a);
$len_b=斯特伦(b美元);
对于($p=max(0,$len_a-$len_b);$p<$len_b;$p++)
if(substr($a,$len_a-($len_b-$p))==substr($b,0,$len_b-$p))
打破
$result=$a.substr($b,$len_b-$p);
回声$结果;
这个结果是http://2.2.2.2/~machinehost/deployment\u folder/users/bob/settings
试试这个
它似乎不是满足您需求的现成代码。因此,让我们寻找一个简单的方法 在本练习中,我使用了两种方法,一种用于查找最长的匹配,另一种用于切掉匹配部分 FindLongestMatch()方法分解一条路径,逐段查找另一条路径中的匹配项,只保留一个最长的匹配项(无数组,无排序)。 RemoveLongestMatch()方法在找到的最长匹配位置后使用后缀或“余数” 以下是完整的源代码:
<?php
function FindLongestMatch($relativePath, $absolutePath)
{
static $_separator = '/';
$splitted = array_reverse(explode($_separator, $absolutePath));
foreach ($splitted as &$value)
{
$matchTest = $value.$_separator.$match;
if(IsSubstring($relativePath, $matchTest))
$match = $matchTest;
if (!empty($value) && IsNewMatchLonger($match, $longestMatch))
$longestMatch = $match;
}
return $longestMatch;
}
//Removes from the first string the longest match.
function RemoveLongestMatch($relativePath, $absolutePath)
{
$match = findLongestMatch($relativePath, $absolutePath);
$positionFound = strpos($relativePath, $match);
$suffix = substr($relativePath, $positionFound + strlen($match));
return $suffix;
}
function IsNewMatchLonger($match, $longestMatch)
{
return strlen($match) > strlen($longestMatch);
}
function IsSubstring($string, $subString)
{
return strpos($string, $subString) > 0;
}
也许你可以接受这段代码的想法,把它变成对你当前项目有用的东西。
让我知道这对你是否也有效。顺便说一下,oreX先生的答案看起来也不错。使用正则表达式也可以找到最长的公共匹配 下面的函数将使用两个字符串,一个用于创建正则表达式,另一个用于执行
/**
* Determine the longest common match within two strings
*
* @param string $str1
* @param string $str2 Two strings in any order.
* @param boolean $case_sensitive Set to true to force
* case sensitivity. Default: false (case insensitive).
* @return string The longest string - first match.
*/
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
// First check to see if one string is the same as the other.
if ( $str1 === $str2 ) return $str1;
if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;
// We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
$delimiter = '#';
// We'll find the shortest string and use that to check substrings and create our regex.
$l1 = strlen( $str1 );
$l2 = strlen( $str2 );
$str = $l1 <= $l2 ? $str1 : $str2;
$str2 = $l1 <= $l2 ? $str2 : $str1;
$l = min( $l1, $l2 );
// Next check to see if one string is a substring of the other.
if ( $case_sensitive ) {
if ( strpos( $str2, $str ) !== false ) {
return $str;
}
}
else {
if ( stripos( $str2, $str ) !== false ) {
return $str;
}
}
// Regex for each character will be of the format (?:a(?=b))?
// We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
$reg = $delimiter;
for ( $i = 0; $i < $l; $i++ ) {
$a = preg_quote( $str[ $i ], $delimiter );
$b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
$reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
}
$reg .= $delimiter;
if ( ! $case_sensitive ) {
$reg .= 'i';
}
// Resulting example regex from a string 'abbc':
// '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';
// Perform our regex on the remaining string
$str = $l1 <= $l2 ? $str2 : $str1;
if ( preg_match_all( $reg, $str, $matches ) ) {
// $matches is an array with a single array with all the matches.
return array_reduce( $matches[0], function( $a, $b ) {
$al = strlen( $a );
$bl = strlen( $b );
// Return the longest string, as long as it's not a single character.
return $al >= $bl || $bl <= 1 ? $a : $b;
}, '' );
}
// No match - Return an empty string.
return '';
}
无论如何,它使用另一种方法运行,并且可以对正则表达式进行优化以处理其他情况。这里的标准是什么?因为从技术上讲,“http”中的h将与“machinehost”中的h相匹配。你必须比“匹配子字符串”更具体。对不起,你完全正确。我的意思是匹配尽可能长的子字符串。嘿!只有链接的答案不是一件好事。为了立即对读者有所帮助(并避免链接腐烂),请至少直接提供解决方案的摘要,并提供用于提供附加信息的链接。如果您对如何编写一个好的答案有疑问,请参阅。原始代码有一个错误,如果两个字符串相等,它将返回一个空字符串(我通过编辑更改了此错误)。由于PCRE发动机,此功能非常快。它的速度是纯PHP解决方案的10倍多。特别是在长字符串上。我怀疑这里可能会出现更多的边缘情况。特别是使用1或2个字符串。您已收到警告。我已将此更新为子字符串,子字符串现在可以接受单个字符并在匹配时返回。问题仍然存在,如果字符串的stortest包含2个相同的字符,它将很容易忽略它们之间的字符。对于微调生成的正则表达式并防止此问题的任何建议,我们将不胜感激;在可能的情况下,将对其进行调查并将其纳入答案中。
http://2.2.2.2/~machinehost/deployment_folder/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/
Suffix: users/bob/settings
http://1.1.1.1/root/~machinehost/deployment_folder/
/root/~machinehost/deployment_folder/users/bob/settings
Longuest match: root/~machinehost/deployment_folder/
Suffix: users/bob/settings
http://2.2.2.2/~machinehost/deployment_folder/users/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/users/
Suffix: bob/settings
http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/
/~machinehost/subDirectory/deployment_folderX/users/bob/settings
Longuest match: ~machinehost/subDirectory/
Suffix: deployment_folderX/users/bob/settings
/**
* Determine the longest common match within two strings
*
* @param string $str1
* @param string $str2 Two strings in any order.
* @param boolean $case_sensitive Set to true to force
* case sensitivity. Default: false (case insensitive).
* @return string The longest string - first match.
*/
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
// First check to see if one string is the same as the other.
if ( $str1 === $str2 ) return $str1;
if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;
// We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
$delimiter = '#';
// We'll find the shortest string and use that to check substrings and create our regex.
$l1 = strlen( $str1 );
$l2 = strlen( $str2 );
$str = $l1 <= $l2 ? $str1 : $str2;
$str2 = $l1 <= $l2 ? $str2 : $str1;
$l = min( $l1, $l2 );
// Next check to see if one string is a substring of the other.
if ( $case_sensitive ) {
if ( strpos( $str2, $str ) !== false ) {
return $str;
}
}
else {
if ( stripos( $str2, $str ) !== false ) {
return $str;
}
}
// Regex for each character will be of the format (?:a(?=b))?
// We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
$reg = $delimiter;
for ( $i = 0; $i < $l; $i++ ) {
$a = preg_quote( $str[ $i ], $delimiter );
$b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
$reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
}
$reg .= $delimiter;
if ( ! $case_sensitive ) {
$reg .= 'i';
}
// Resulting example regex from a string 'abbc':
// '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';
// Perform our regex on the remaining string
$str = $l1 <= $l2 ? $str2 : $str1;
if ( preg_match_all( $reg, $str, $matches ) ) {
// $matches is an array with a single array with all the matches.
return array_reduce( $matches[0], function( $a, $b ) {
$al = strlen( $a );
$bl = strlen( $b );
// Return the longest string, as long as it's not a single character.
return $al >= $bl || $bl <= 1 ? $a : $b;
}, '' );
}
// No match - Return an empty string.
return '';
}
// Works as intended.
get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';
// Returns incorrect substring based on string length and recurring substrings.
get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';
// Does not return any matches, as all recurring strings are only a single character long.
get_longest_common_subsequence( 'abc', 'ace' ) === '';
// One of the strings is a substring of the other.
get_longest_common_subsequence( 'abc', 'a' ) === 'a';