java中的正则表达式：匹配BOL和EOL_Java_Regex

java中的正则表达式：匹配BOL和EOL

java regex

java中的正则表达式：匹配BOL和EOL,java,regex,Java,Regex,我尝试在windows下用java解析windows ini文件。假设内容是： [section1] key1=value1 key2=value2 [section2] key1=value1 key2=value2 [section3] key1=value1 key2=value2 我使用以下代码： Pattern pattSections = Pattern.compile("^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL +

我尝试在windows下用java解析windows ini文件。假设内容是：

[section1]
key1=value1
key2=value2
[section2]
key1=value1
key2=value2
[section3]
key1=value1
key2=value2

我使用以下代码：

Pattern pattSections = Pattern.compile("^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL + Pattern.MULTILINE);
Pattern pattPairs = Pattern.compile("^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*)$", Pattern.DOTALL + Pattern.MULTILINE);
// parse sections
Matcher matchSections = pattSections.matcher(content);
while (matchSections.find()) {
    String keySection = matchSections.group(1);
    String valSection = matchSections.group(2);
    // parse section content
    Matcher matchPairs = pattPairs.matcher(valSection);
    while (matchPairs.find()) {
        String keyPair = matchPairs.group(1);
        String valPair = matchPairs.group(2);
    }
}

但它不能正常工作：

第1部分不匹配。这可能是因为这不是从“下线后”开始的。当我将空字符串放在

[section1]

之前时，它匹配

valSection

返回'\r\nke1=value1\r\nkey2=value2\r\n'。

keyPair

返回“key1”。看起来还可以。但是

valPair

返回'value1\r\nkey2=value2\r\n'，而不是所需的'value1'

这里出了什么问题？

第一个正则表达式刚刚起作用（您如何读取文件不是有问题吗？），第二个正则表达式添加了“？”符号，以不情愿的方式使用它

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {
        String content = "[section1]\r\n" +
        "key1=value1\r\n" +
        "key2=value2\r\n" +
        "[section2]\r\n" +
        "key1=value1\r\n" +
        "key2=value2\r\n" +
        "[section3]\r\n" +
        "key1=value1\r\n" +
        "key2=value2\r\n";

        Pattern pattSections = Pattern.compile(
                "^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL
                        + Pattern.MULTILINE);
        Pattern pattPairs = Pattern.compile(
                "^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*?)$", Pattern.DOTALL
                        + Pattern.MULTILINE);
        // parse sections
        Matcher matchSections = pattSections.matcher(content);
        while (matchSections.find()) {
            String keySection = matchSections.group(1);
            String valSection = matchSections.group(2);
            // parse section content
            Matcher matchPairs = pattPairs.matcher(valSection);
            while (matchPairs.find()) {
                String keyPair = matchPairs.group(1);
                String valPair = matchPairs.group(2);
            }
        }

    }

}

您不需要

DOTALL

标志，因为您在图案中根本不使用点

我认为Java将

\n

本身视为换行符，因此

\r

不会被处理。您的模式：

^\\[([a-zA-Z_0-9\\s]+)\\]$

不会是真的，但事实并非如此

^\\[([a-zA-Z_0-9\\s]+)\\]\r$

威尔

我建议您也忽略多行，并使用以下模式作为线分隔符：

(^|\r\n)
($|\r\n)

您没有在检查值时排除“新行”。Ad 2。

pattPairs

中定义的模式是贪婪的，因此匹配到第二个键的末尾。您可以在此阅读贪婪和非贪婪匹配以及如何补偿：您是否尝试先用

\r\n

替换

\r\n

？您不喜欢？不幸的是，这在我的情况下不起作用。

与第一个模式中BOF处的BOL不匹配，符号$被视为简单$，但与第二个模式中的

[^$]

中的EOL不匹配。您是否尝试按照建议执行代码？也许你可以展示你的ini文件的一个样本。。。