Java 如何按引号、站点运算符和非引号拆分字符串？_Java_String_Parsing

Java 如何按引号、站点运算符和非引号拆分字符串？

java string parsing

Java 如何按引号、站点运算符和非引号拆分字符串？,java,string,parsing,Java,String,Parsing,我收到用户这样的请求 site:www.example.com \"hello world\" \"hi abc\" where are you 我想从这个字符串中提取并保存url，然后从上面的字符串中删除它，它看起来应该像这样“hello world”“hi abc”“您在哪里现在将剩余的字符串拆分为两个字符串数组 String str1 = {hello world, hi abc}; String str2 = {where, are, you}; 我如何在java中做到这一点？用户

我收到用户这样的请求

site:www.example.com \"hello world\" \"hi abc\" where are you

我想从这个字符串中提取并保存url，然后从上面的字符串中删除它，它看起来应该像这样

“hello world”“hi abc”“您在哪里

现在将剩余的字符串拆分为两个字符串数组

String str1 = {hello world, hi abc};
String str2 = {where, are, you};

我如何在java中做到这一点？用户查询可以是任意顺序。各种例子：

 "hi" excitement site:www.example.com \"hello world\" \"hi abc\" where are you "amazing"   
OR
    Hello World friends
OR
 Greeting is an "act of communication" human beings "intentionally"

这是一个非常具体的问题，下面的逻辑可能会对您有所帮助。我建议您在使用实际数据进行测试时对此进行完善

public static void main(String[] args) {
    String test1 = "site:www.example.com \"hello world\" \"hi abc\" where are you";
    String regex = "\\b(https?|ftp|file|site):[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
    String[] info = test1.split("\"");

    //read url
    String url;
    if (info.length > 0 && info[0].trim().matches(regex))
        url = info[0].trim();
    else
        throw new RuntimeException("Not a valid input");

    // read str1
    String[] info1 = Arrays.copyOfRange(info, 1, info.length - 1);
    String str1 = mkString(info1, ",");

    //read str2
    String[] info2 = info[info.length - 1].trim().split("\\s");
    String str2 = mkString(info2, ",");


    System.out.println("URL: " + url + " STR1: " + str1 + " STR2: " + str2);

}

// returns a delimited and curly parentheses {} enclosed string
public static String mkString(String[] input, String delimeter) {
    String result = "{";
    for (int i = 0; i < input.length - 1; i++) {
        if (input[i].trim().length() > 0) {
            result += (input[i] + delimeter);
        }
    }
    result += (input[input.length - 1] + "}");
    return result;
}

publicstaticvoidmain（字符串[]args）{
String test1=“site:www.example.com\“你好，世界”\“你好，abc\“你在哪里”；
String regex=“\\b（https？| ftp | file | site）：[-a-zA-Z0-9+&@#/%？=~ |！：，.；]*[-a-zA-Z0-9+&@#/%=~ |]；
String[]info=test1.split（“\”）；
//读取url
字符串url；
如果（info.length>0&&info[0].trim（）.matches（regex））
url=信息[0]。修剪（）；
其他的
抛出新的RuntimeException（“不是有效的输入”）；
//读str1
字符串[]info1=Arrays.copyOfRange（info，1，info.length-1）；
字符串str1=mkString（info1，“，”）；
//阅读str2
字符串[]info2=info[info.length-1].trim（）.split（“\\s”）；
字符串str2=mkString（info2，“，”）；
System.out.println（“URL:+URL+”STR1:+STR1+”STR2:+STR2”）；
}
//返回一个带分隔符和圆括号的{}封闭字符串
公共静态字符串mkString（字符串[]输入，字符串delimeter）{
字符串结果=“{”；
对于（int i=0；i0）{
结果+=（输入[i]+测力计）；
}
}
结果+=（输入[input.length-1]+“}”）；
返回结果；
}

我认为此代码可以帮助您：

static class ExtractResponse {
    String newStr;
    String site;
}

public static ExtractResponse extractSite(String origin) {
    Pattern pattern = Pattern.compile("site:\\S* ");
    Matcher matcher = pattern.matcher(origin);

    ExtractResponse response = new ExtractResponse();
    StringBuffer buffer = new StringBuffer();
    while (matcher.find()) {
        response.site = matcher.group().substring(5); // 5 is length of "site:"
        matcher.appendReplacement(buffer, "");
    }
    matcher.appendTail(buffer);

    response.newStr = buffer.toString();
    return response;
}

它将返回包含新字符串的响应，而不包含站点：*和站点url。例如，我使用了您的回答和评论中的案例：

public static void main(String[] args) {
    String str1 = "site:www.example.com \"hello world\" \"hi abc\" where are you";
    String str2 = "\"hello world\" \"hi abc\" site:www.example.com where are you";

    ExtractResponse response1 = extractSite(str1);
    System.out.println(response1.newStr);
    System.out.println(response1.site);

    ExtractResponse response2 = extractSite(str2);
    System.out.println(response2.newStr);
    System.out.println(response2.site);
}

输出：

“你好，世界”“你好，abc”你在哪里

www.example.com

“你好，世界”“你好，abc”你在哪里

www.example.com

你可以通过一些类似于

substring

和

replaceAll

的方法来实现这一点，用户可以按任何顺序进行查询：“嗨”兴奋网站：www.example.com\“你好世界”\“嗨abc\”你在哪里“令人惊讶？”“你能更新一下这个问题吗？里面不清楚。另外，您要问的是一个非常具体的用例。我建议您使用正则表达式，就像我上面为url使用的正则表达式一样，也为其他字符串使用正则表达式。对于带引号的字符串，正则表达式应该是-（？：（？）？