Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 用于嵌套引号的正则表达式_Java_Regex_Quotes - Fatal编程技术网

Java 用于嵌套引号的正则表达式

Java 用于嵌套引号的正则表达式,java,regex,quotes,Java,Regex,Quotes,拉引号在自由范围文本中并不少见。假设我们想要识别拉引号,即使它们嵌套在一个句子中。例如,假设我们有一个字符串,其嵌套的拉引号形式如下: 一两“三四”五六“七八”九“十一” 是否有一个java正则表达式可以找到以下3个组: 三四 七八“九”十十一 9个 根据威克托的建议,我提出了以下建议。虽然在任何方面都不优雅,但这似乎起到了作用: public List<String> parseNestedSingleQuotes(String text) { return parseNe

拉引号在自由范围文本中并不少见。假设我们想要识别拉引号,即使它们嵌套在一个句子中。例如,假设我们有一个字符串,其嵌套的拉引号形式如下:

一两“三四”五六“七八”九“十一”

是否有一个java正则表达式可以找到以下3个组:

  • 三四
  • 七八“九”十十一
  • 9个

  • 根据威克托的建议,我提出了以下建议。虽然在任何方面都不优雅,但这似乎起到了作用:

    public List<String> parseNestedSingleQuotes(String text) {
        return parseNestedQuotes(text,'\'','{');
    }
    
    public List<String> parseNestedDoubleQuotes(String text) {
        return parseNestedQuotes(text,'"','{');
    }
    
    public List<String> parseNestedQuotes(String text,char quoteChar,char markChar) {
        List<String> groups = new ArrayList<String>();
        char[] charArray = text.toCharArray();
    
        Matcher m = Pattern.compile("("+quoteChar+")\\w",Pattern.CASE_INSENSITIVE).matcher(text);
        while( m.find() ) {
            charArray[m.start()] = markChar;
        }
        //System.out.println("debug charArray with marks: " + new String(charArray));
    
        m = Pattern.compile("\\w("+quoteChar+")",Pattern.CASE_INSENSITIVE).matcher(text);
        while( m.find() ) {
            int endIdx = m.start()+1;
            int startIdx = unmarkLastIndexOf(charArray,endIdx,quoteChar,markChar);
            if( startIdx != -1 ) {
                groups.add(text.substring(startIdx+1,endIdx));
            }
        }
        return groups;
    }
    
    int unmarkLastIndexOf(char[] charArray, int endIdx, char quoteChar, char markChar) {
        String template = new String(charArray);
        int idx = template.lastIndexOf(markChar,endIdx-1);
        if( idx != -1 ) {
            charArray[idx] = quoteChar;
            return idx;
        }
        return -1;
    }
    

    任何改进建议???

    正则表达式在这里都没有帮助,Java正则表达式不支持递归。到目前为止,您尝试了什么?发布你的代码!你运行它时发生了什么?你期望会发生什么呢?任何逻辑如何区分“七八”九“十十一”和两个引用部分“七八”和“十十一”?你想要三个正则表达式(一个找到每个期望值)还是一个找到所有三个的正则表达式?那么,获取不带单词har/space+后跟非空格或单词char的引号与不带空格或单词char但后跟非空格/单词char的引号之间的所有内容?
    void test_parseNestedQuotes()
    {
        String input = "zero 'one two' three 'four five 'six seven' eight' nine";
        System.out.println("nested singleQuote input: " + input);
        List<String>groups = parseNestedSingleQuotes(input);
        System.out.println("nested singleQuote groups:");
        printListOfString(groups);
        assert groups.size() == 3;
        System.out.println("--------");
    
        input = "one two \"three four\" five six \"seven eight \"nine\" ten eleven\" twelve";
        System.out.println("nested doubleQuote input: " + input);
        groups = parseNestedDoubleQuotes(input);
        System.out.println("nested doubleQuote groups:");
        printListOfString(groups);
        assert groups.size() == 3;
        System.out.println("--------");
    
        input = "one two \"three four\" five six \"seven eight \"nine\" ten eleven twelve";
        System.out.println("nested doubleQuote input with unmatched pairs: " + input);
        groups = parseNestedDoubleQuotes(input);
        System.out.println("nested doubleQuote groups from unmatched pairs:");
        printListOfString(groups);
        assert groups.size() == 2;
        System.out.println("--------");
    
        input = "one two (three four) five six";
        System.out.println("no doubleQuote input with parens: " + input);
        groups = parseNestedDoubleQuotes(input);
        System.out.println("no doubleQuote groups from paren pairs:");
        printListOfString(groups);
        assert groups.size() == 0;
        System.out.println("--------");
    }
    
    void printListOfString(List<String> list) { 
        for( String string : list )
            System.out.println(string);
    }
    
    nested singleQuote input: zero 'one two' three 'four five 'six seven' eight' nine
    nested singleQuote groups:
    one two
    six seven
    four five 'six seven' eight
    --------
    nested doubleQuote input: one two "three four" five six "seven eight "nine" ten eleven" twelve
    nested doubleQuote groups:
    three four
    nine
    seven eight "nine" ten eleven
    --------
    nested doubleQuote input with unmatched pairs: one two "three four" five six "seven eight "nine" ten eleven twelve
    nested doubleQuote groups from unmatched pairs:
    three four
    nine
    --------
    no doubleQuote input with parens: one two (three four) five six
    no doubleQuote groups from paren pairs:
    --------