java正则表达式拆分_Java_Regex

java正则表达式拆分

java regex

java正则表达式拆分,java,regex,Java,Regex,我有一个字符串，比如： Snt:It was the most widespread day of environmental action in the planet's history ==================== ----------- Snt:Five years ago, I was working for just over minimum wage ==================== ----------- 我想用你的手把绳子分开 ===============

我有一个字符串，比如：

Snt:It was the most widespread day of environmental action in the planet's history
====================
-----------
Snt:Five years ago, I was working for just over minimum wage
====================
-----------

我想用你的手把绳子分开

====================
-----------

当然，从第一个句子中删除Snt:。最好的方法是什么

我使用了这个正则表达式，但它不起作用

String[] content1 =content.split("\\n\\====================\\n\\-----------\\n");

提前感谢。

因为最后一行没有换行符，所以它与最后一行==，-不匹配。您需要在最后添加行尾锚点$，作为正则表达式中\n的替代

String s = "Snt:It was the most widespread day of environmental action in the planet's history\n" +
"====================\n" +
"-----------\n" +
"Snt:Five years ago, I was working for just over minimum wage\n" +
"====================\n" +
"-----------";
String m = s.replaceAll("(?m)^Snt:", "");
String[] tok = m.split("\\n\\====================\\n\\-----------(?:\\n|$)");
System.out.println(Arrays.toString(tok));

输出：

[It was the most widespread day of environmental action in the planet's history, Five years ago, I was working for just over minimum wage]

因为最后一行不存在换行符，所以它与最后一行==，-不匹配。您需要在最后添加行尾锚点$，作为正则表达式中\n的替代

String s = "Snt:It was the most widespread day of environmental action in the planet's history\n" +
"====================\n" +
"-----------\n" +
"Snt:Five years ago, I was working for just over minimum wage\n" +
"====================\n" +
"-----------";
String m = s.replaceAll("(?m)^Snt:", "");
String[] tok = m.split("\\n\\====================\\n\\-----------(?:\\n|$)");
System.out.println(Arrays.toString(tok));

输出：

[It was the most widespread day of environmental action in the planet's history, Five years ago, I was working for just over minimum wage]

由于数据的结构方式，我将把概念从拆分中颠倒过来，改为作为匹配者。这也让您能够很好地匹配Snt：

private static final String VAL = "Snt:It was the most widespread day of environmental action in the planet's history\n"
        + "====================\n"
        + "-----------\n"
        + "Snt:Five years ago, I was working for just over minimum wage\n"
        + "====================\n"
        + "-----------";

public static void main(String[] args) {
    List<String> phrases = new ArrayList<String>();
    Matcher mat = Pattern.compile("Snt:(.+?)\n={20}\n-{11}\\s*").matcher(VAL);
    while (mat.find()) {
        phrases.add(mat.group(1));
    }

    System.out.printf("Value: %s%n", phrases); 
}

我使用正则表达式：Snt:.+？\n={20}\n-{11}\\s*

这假定文件中的第一个单词是Snt:，然后它对下一个短语进行分组，直到使用分隔符为止。它将使用任何尾随空格，使表达式为下一条记录做好准备

这个过程的好处是匹配匹配一条记录，而不是有一个表达式匹配一条记录的部分结尾，一条可能是下一条记录的开始。

由于数据的结构方式，我将从拆分的概念转变为匹配器。，这也允许您很好地计算Snt：

private static final String VAL = "Snt:It was the most widespread day of environmental action in the planet's history\n"
        + "====================\n"
        + "-----------\n"
        + "Snt:Five years ago, I was working for just over minimum wage\n"
        + "====================\n"
        + "-----------";

public static void main(String[] args) {
    List<String> phrases = new ArrayList<String>();
    Matcher mat = Pattern.compile("Snt:(.+?)\n={20}\n-{11}\\s*").matcher(VAL);
    while (mat.find()) {
        phrases.add(mat.group(1));
    }

    System.out.printf("Value: %s%n", phrases); 
}

我使用正则表达式：Snt:.+？\n={20}\n-{11}\\s*

这假定文件中的第一个单词是Snt:，然后它对下一个短语进行分组，直到使用分隔符为止。它将使用任何尾随空格，使表达式为下一条记录做好准备

这个过程的好处是匹配匹配一条记录，而不是有一个表达式匹配一条记录的部分结尾，一条可能是下一条记录的开始。

那怎么办

Pattern p = Pattern.compile("^Snt:(.*)$", Pattern.MULTILINE);
Matcher m = p.matcher(str);

while (m.find()) {
    String sentence = m.group(1);
}

与其使用split进行黑客攻击并进行额外的解析，不如只查找以Snt开头的行，然后捕获后面的内容。

怎么样

Pattern p = Pattern.compile("^Snt:(.*)$", Pattern.MULTILINE);
Matcher m = p.matcher(str);

while (m.find()) {
    String sentence = m.group(1);
}

Matcher m = Pattern.compile("([^=\\-]+)([=\\-]+[\\t\\n\\s]*)+").matcher(str);   
while (m.find()) {
    String match = m.group(1);
    System.out.println(match);
}

与使用拆分和进行额外的解析不同，这只是查找以Snt开头的行，然后捕获下面的内容。

使用content.replaceAllSnt:；然后进行拆分这可能不是拆分的最佳用途。你正在从文件中读取这些行吗？也许检查从BufferedReader返回的行才是您真正想要做的；然后进行拆分这可能不是拆分的最佳用途。你正在从文件中读取这些行吗？也许检查从BufferedReader返回的行才是您真正想要做的。您忘记了使用Pattern.MULTILINE标志让$匹配行的结尾，而不仅仅是字符串的结尾。无论如何+1，使用split无法合理地完成此操作，除非我们希望忽略结果数组中的第一个元素，因为还需要删除Snt:。您忘记使用Pattern.MULTILINE标志让$match-end-of-line，而不仅仅是字符串的结尾。无论如何+1，这不能用split合理地完成，除非我们想忽略结果数组中的第一个元素，因为还需要删除Snt:。

Matcher m = Pattern.compile("([^=\\-]+)([=\\-]+[\\t\\n\\s]*)+").matcher(str);   
while (m.find()) {
    String match = m.group(1);
    System.out.println(match);
}