Java 用于嵌套引号的正则表达式
拉引号在自由范围文本中并不少见。假设我们想要识别拉引号,即使它们嵌套在一个句子中。例如,假设我们有一个字符串,其嵌套的拉引号形式如下: 一两“三四”五六“七八”九“十一” 是否有一个java正则表达式可以找到以下3个组:Java 用于嵌套引号的正则表达式,java,regex,quotes,Java,Regex,Quotes,拉引号在自由范围文本中并不少见。假设我们想要识别拉引号,即使它们嵌套在一个句子中。例如,假设我们有一个字符串,其嵌套的拉引号形式如下: 一两“三四”五六“七八”九“十一” 是否有一个java正则表达式可以找到以下3个组: 三四 七八“九”十十一 9个 根据威克托的建议,我提出了以下建议。虽然在任何方面都不优雅,但这似乎起到了作用: public List<String> parseNestedSingleQuotes(String text) { return parseNe
根据威克托的建议,我提出了以下建议。虽然在任何方面都不优雅,但这似乎起到了作用:
public List<String> parseNestedSingleQuotes(String text) {
return parseNestedQuotes(text,'\'','{');
}
public List<String> parseNestedDoubleQuotes(String text) {
return parseNestedQuotes(text,'"','{');
}
public List<String> parseNestedQuotes(String text,char quoteChar,char markChar) {
List<String> groups = new ArrayList<String>();
char[] charArray = text.toCharArray();
Matcher m = Pattern.compile("("+quoteChar+")\\w",Pattern.CASE_INSENSITIVE).matcher(text);
while( m.find() ) {
charArray[m.start()] = markChar;
}
//System.out.println("debug charArray with marks: " + new String(charArray));
m = Pattern.compile("\\w("+quoteChar+")",Pattern.CASE_INSENSITIVE).matcher(text);
while( m.find() ) {
int endIdx = m.start()+1;
int startIdx = unmarkLastIndexOf(charArray,endIdx,quoteChar,markChar);
if( startIdx != -1 ) {
groups.add(text.substring(startIdx+1,endIdx));
}
}
return groups;
}
int unmarkLastIndexOf(char[] charArray, int endIdx, char quoteChar, char markChar) {
String template = new String(charArray);
int idx = template.lastIndexOf(markChar,endIdx-1);
if( idx != -1 ) {
charArray[idx] = quoteChar;
return idx;
}
return -1;
}
任何改进建议???正则表达式在这里都没有帮助,Java正则表达式不支持递归。到目前为止,您尝试了什么?发布你的代码!你运行它时发生了什么?你期望会发生什么呢?任何逻辑如何区分“七八”九“十十一”和两个引用部分“七八”和“十十一”?你想要三个正则表达式(一个找到每个期望值)还是一个找到所有三个的正则表达式?那么,获取不带单词har/space+后跟非空格或单词char的引号与不带空格或单词char但后跟非空格/单词char的引号之间的所有内容?
void test_parseNestedQuotes()
{
String input = "zero 'one two' three 'four five 'six seven' eight' nine";
System.out.println("nested singleQuote input: " + input);
List<String>groups = parseNestedSingleQuotes(input);
System.out.println("nested singleQuote groups:");
printListOfString(groups);
assert groups.size() == 3;
System.out.println("--------");
input = "one two \"three four\" five six \"seven eight \"nine\" ten eleven\" twelve";
System.out.println("nested doubleQuote input: " + input);
groups = parseNestedDoubleQuotes(input);
System.out.println("nested doubleQuote groups:");
printListOfString(groups);
assert groups.size() == 3;
System.out.println("--------");
input = "one two \"three four\" five six \"seven eight \"nine\" ten eleven twelve";
System.out.println("nested doubleQuote input with unmatched pairs: " + input);
groups = parseNestedDoubleQuotes(input);
System.out.println("nested doubleQuote groups from unmatched pairs:");
printListOfString(groups);
assert groups.size() == 2;
System.out.println("--------");
input = "one two (three four) five six";
System.out.println("no doubleQuote input with parens: " + input);
groups = parseNestedDoubleQuotes(input);
System.out.println("no doubleQuote groups from paren pairs:");
printListOfString(groups);
assert groups.size() == 0;
System.out.println("--------");
}
void printListOfString(List<String> list) {
for( String string : list )
System.out.println(string);
}
nested singleQuote input: zero 'one two' three 'four five 'six seven' eight' nine
nested singleQuote groups:
one two
six seven
four five 'six seven' eight
--------
nested doubleQuote input: one two "three four" five six "seven eight "nine" ten eleven" twelve
nested doubleQuote groups:
three four
nine
seven eight "nine" ten eleven
--------
nested doubleQuote input with unmatched pairs: one two "three four" five six "seven eight "nine" ten eleven twelve
nested doubleQuote groups from unmatched pairs:
three four
nine
--------
no doubleQuote input with parens: one two (three four) five six
no doubleQuote groups from paren pairs:
--------