Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java正则表达式特殊字符转义_Java_Regex - Fatal编程技术网

Java正则表达式特殊字符转义

Java正则表达式特殊字符转义,java,regex,Java,Regex,我正在尝试创建一个正则表达式,它几乎可以接受美式键盘上的所有字符,只有少数几个字符例外。这是我目前拥有的(并非全部包括在内): 现在我知道^是我遇到的第一个需要在前面转义的字符。当我放入一个\时,我得到一个编译错误(无效的转义序列)。当我对字符串运行此命令时,它会完全忽略^规则。有人知道我做错了什么吗?您不必逃避^因为您使用的是a,只需使用: ^[a-zA-Z0-9!~`@#$%^] [..]使用的字符类允许您将所需的字符放在方括号内,并且特殊字符不再特殊。您应该转义的唯一情况是,如果您正在使

我正在尝试创建一个正则表达式,它几乎可以接受美式键盘上的所有字符,只有少数几个字符例外。这是我目前拥有的(并非全部包括在内):


现在我知道
^
是我遇到的第一个需要在前面转义的字符。当我放入一个
\
时,我得到一个编译错误(无效的转义序列)。当我对字符串运行此命令时,它会完全忽略
^
规则。有人知道我做错了什么吗?

您不必逃避
^
因为您使用的是a,只需使用:

^[a-zA-Z0-9!~`@#$%^]
[
..
]
使用的字符类允许您将所需的字符放在方括号内,并且特殊字符不再特殊。您应该转义的唯一情况是,如果您正在使用例如
\d
\w
之类的快捷范围,因为您在java中使用反斜杠,那么您需要将其转义为
\\d
\\w
(但这只是因为java,而不是正则表达式引擎)

例如:

"a".matches("^[a-zA-Z0-9!~`@#$%^]");
"asdf".matches("^[a-zA-Z0-9!~`@#$%^]+"); // for multiple characters

当您想要逐字匹配它时,您只需要转义
^
,也就是说,您想要查找包含^character的文本

如果您打算使用带有特殊含义的“^”(行/字符串的开头),则无需转义它。简单类型

"^[a-zA-Z0-9!~`@#$%\\^]"
在源代码中。这个正则表达式末尾的反斜杠无关紧要。由于反斜杠在Java中的特殊含义,您需要键入2个反斜杠,但这与它的处理正则表达式无关。正则表达式引擎接收到一个反斜杠,它使用该反斜杠将以下字符作为文本读取,但^仍然是括号内的文本

详细说明您对[和]的评论:

括号在正则表达式中具有特殊的含义,因为它们基本上构成了模式给定的字符列表的边界(所述字符构成所谓的字符类)。让我们从上面分解正则表达式,让事情变得更清楚

^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\ Backslash. Regular expression engine only receives single backslash as the other backslash is consumed by Java's syntax for Strings. Would be used to mark following character as literal but ^ is a literal in character class definitions anyway so theses backslashes are ignored.
^ Caret, literally
] Closing boundary of your character class
字符类定义中模式的顺序是不相关的。 如果检查文本的第一个字符是字符类定义的一部分,则上述表达式匹配。如果所检查文本中的其他字符很重要,则这取决于如何使用正则表达式

当您开始使用正则表达式时,您应该始终使用多个测试文本来匹配并验证行为。还建议将这些测试用例作为单元测试,以获得程序正确行为的高置信度

测试表达式的简单代码示例如下所示:

public class Test {
    public static void main(String[] args) {
        String regexp = "^[ a-zA-Z0-9!~`@#$%\\\\^\\[\\]]+$";
        String[] testdata = new String[] {
                "abc",
                "2332",
                "some@test",
                "test [ and ] test end",
                // Following sample will not match the pattern.
                "äöüßµøł"
        };
        for (String toExamine : testdata) {
            if (toExamine.matches(regexp)) {
                System.out.println("Match: " + toExamine);
            } else {
                System.out.println("No match: " + toExamine);
            }
        }
    }
}
请注意,我在这里使用修改后的模式。它确保所检查字符串中的所有字符都与您的字符类匹配。我确实扩展了character类以允许\和空格以及[and]。 分解后的描述是:

^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\\\ Backslash, literally. Regular expression engine only receives 2 backslashes as every other backslash is consumed by Java's syntax for Strings. The first backslash is seen as marking the second backslash a occurring literally in the string.
^ Caret, literally
\\[ Opening bracket, literally. The backslash makes the bracket loose its meaning as opening a character class definition.
\\] Closing bracket, literally. The backslash makes the bracket loose its meaning as closing a character class definition.
] Closing boundary of your character class
+ Means any number of characters matching your character class definition can occur, but at least 1 such character needs to be present for a match
$ Matches the start of the text

有一件事我不明白,那就是为什么人们会使用美国键盘的字符作为验证标准。

你不需要逃避^@JoshLevine lol,是的。。。但是我喜欢正则表达式,所以。。。很简单:PDo我需要逃过任何角色吗?@JoshLevine你在那里用的都没有。您使用的是
字符类
,因此无需转义特殊字符。我已经更新了答案谢谢你的帮助。那怎么办,他们似乎给我添了麻烦now@JoshLevine我不知道你写的那个字是什么,我只能看到一个正方形。如果需要匹配,则需要查找所需的字符代码并将其放入字符类谢谢您的帮助。关于[],他们现在似乎给我带来了一些麻烦。我添加了一个关于括号的描述,并修复了前面解释中的一个错误。
^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\\\ Backslash, literally. Regular expression engine only receives 2 backslashes as every other backslash is consumed by Java's syntax for Strings. The first backslash is seen as marking the second backslash a occurring literally in the string.
^ Caret, literally
\\[ Opening bracket, literally. The backslash makes the bracket loose its meaning as opening a character class definition.
\\] Closing bracket, literally. The backslash makes the bracket loose its meaning as closing a character class definition.
] Closing boundary of your character class
+ Means any number of characters matching your character class definition can occur, but at least 1 such character needs to be present for a match
$ Matches the start of the text