Regex提取嵌套引号内的文本[Java/Json]

Regex提取嵌套引号内的文本[Java/Json],java,json,regex,pattern-matching,Java,Json,Regex,Pattern Matching,在下面的文本中,我想提取引号内的值,例如“hash”。 与哈希关联的值从引号的开头一直到结尾,在这种情况下: 00000 E96C46D15AEAAF9EF6F88A295A8F17207D4CD9AC074D2314680095BEFC854D5A00600602AF2FE03A24B61566CA2D8A6B858B0AF840309AE449323 我的模式是 Scanner s = new Scanner(new File(path.toString())); Pattern patte

在下面的文本中,我想提取引号内的值,例如“hash”。 与哈希关联的值从引号的开头一直到结尾,在这种情况下:

00000 E96C46D15AEAAF9EF6F88A295A8F17207D4CD9AC074D2314680095BEFC854D5A00600602AF2FE03A24B61566CA2D8A6B858B0AF840309AE449323

我的模式是

Scanner s = new Scanner(new File(path.toString()));
Pattern pattern = Pattern.compile("\"hash\": \".*\"");
String nextMatch = s.findWithinHorizon(pattern, 0);
对模式的解释:我看的是从引号开始的序列,后跟单词hash和另一个引号。然后a“:”跟随+1空格。之后,尽可能多的文本,直到出现另一个引号

遗憾的是,这种模式不起作用,我不明白为什么

{“散列”: “00000 E96C46D15AEAAF9EF6F88A295A8F17207D4CD9AC074D2314680095BEFC854D5A00600602AF2FE03A24B61566CA2D8A6B858B0AF840309AE44933923”, “块”:“{\”类型\“:\”块\“,\”事务\“:[],\”时间戳\“: \“2017-09-07T07:09:52.628676\”,“奖励”: \"D5075B5D43CF97B73BD6483488F1F6A648DC83ADD93A37B0817B17331FD51D989E2CF9FD3C8C0206FB89B84CF9E151B7D2123E4F6D71C95868DFE1F4AA6B9E754A51A8E04BD49F5EEC1931840315BC42844B715250534612DA5E5809BDB14C496AD1D4B00823B80AACB7023667CA6988B538DC505BF291620B28B758D758C5077F3077C5077C5077CF078C5077C5078C5077C5077F3077C57C5077C5077C57D7C57C5070707070707078D8D8D8D8D7D7C5070707070707070707078D8D8D8D8D7D7D7E19ECCC72072939D5D16409843151B55607715F7EA9EFF9191414C88F1E719ED5C5E957379777BD96B9150CE5A54C491AA94EAB58DF129445D89C9F8937C598BA95380A42C22E06ED2DA49B331E99E25554C122A095B2520BA3DCFF6585C8C07CC6DA9D37AD7E71ADE2676704C7A07ACA7994EFC458DF129445D89C9BA95380A42C22E06E99E2555554C122A0957B2520BA957CFCFF65857C07CFC7CFC7CFC7C07CC6AD7AD7AD7AD7AD7AD7AD717AD717AD717AD717AD717EA717EA78676767676767676767676767676767677DA1CB1781426FEF33721B66E727EA7AEF19FB5EDC3E16C6D7F08F04F5067DC9A2D0C001015C1AF848FCD6EEF039C9C5D8E737C0655A97B6BC876854A34AD94FCD29218524C67881BD1A9279EDC12F95720D8A010D9A57DD19A4415BED2687FB462D95DA8436954B5B5B5B5F98935650A1FD7B6B7B8B8B8AD89838B8B8B8C5038B8B8B8AD838B8B8B8C5838B8B8B8B8B8D8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B, \“难度\”:“0\”,“暂时\”: \“FEEC6D57F31D8AEE18889026E4E484D96DE6B874013A1932018E809C60C45019033389671DCC2E3138A555705EC95E365D79D3E68A909EFCF15D0D137770131\”, \“家长\”: \"00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\"}", “type”:“block_hash”}

我的全部代码:

public class TryToStream {

    static String url = "SorryICantShowYouThatOne";
    static String charset = "UTF-8";


    public static void main(String[] args) throws IOException, ParseException {
        JSONParser parser = new JSONParser();

        URL getURL = new URL(url + "get?start_at=");
        int counter = 0;
        boolean inputAvail = true;
        //clear textfile
        PrintWriter pw = new PrintWriter("jsonFormatted.txt");


        URL tmpURL = new URL(url + "get?start_at=" + counter);
        URLConnection connection = tmpURL.openConnection();
        InputStream is = connection.getInputStream();
        JSONArray json = (JSONArray) parser.parse(new BufferedReader(new InputStreamReader(is)));
        //   FileOutputStream fos = new FileOutputStream(new File("output2.txt"), true);
        BufferedWriter bw = new BufferedWriter(new FileWriter("jsonFormattedStream.txt"));
        bw.write(json.toJSONString());
        bw.close();

        Iterator iter = json.iterator();
        boolean flagForTesting = true;
        BufferedWriter bw2 = new BufferedWriter(new FileWriter("jsonFormatted.txt"));
        Pattern pattern = Pattern.compile("\"hash\": \"(.*?)\"");

        while (iter.hasNext() && flagForTesting) {

            Matcher matcher = pattern.matcher(iter.next().toString());
            matcher.find();
            System.out.println(matcher.group(1));
            flagForTesting = false;
        }
        bw2.close();


        System.out.println("End");
    }
}
如果我尝试匹配建议的正则表达式,我就不会得到匹配项

iter.next()的结果:

{“块”:“{\”类型\“:\”块\“,\”事务\“:[],\”时间戳\“: \“2017-09-07T07:09:52.628676\”,“奖励”: \"D5075B5D43CF97B73BD6483488F1F6A648DC83ADD93A37B0817B17331FD51D989E2CF9FD3C8C0206FB89B84CF9E151B7D2123E4F6D71C95868DFE1F4AA6B9E754A51A8E04BD49F5EEC1931840315BC42844B715250534612DA5E5809BDB14C496AD1D4B00823B80AACB7023667CA6988B538DC505BF291620B28B758D758C5077F3077C5077C5077CF078C5077C5078C5077C5077F3077C57C5077C5077C57D7C57C5070707070707078D8D8D8D8D7D7C5070707070707070707078D8D8D8D8D7D7D7E19ECCC72072939D5D16409843151B55607715F7EA9EFF9191414C88F1E719ED5C5E957379777BD96B9150CE5A54C491AA94EAB58DF129445D89C9F8937C598BA95380A42C22E06ED2DA49B331E99E25554C122A095B2520BA3DCFF6585C8C07CC6DA9D37AD7E71ADE2676704C7A07ACA7994EFC458DF129445D89C9BA95380A42C22E06E99E2555554C122A0957B2520BA957CFCFF65857C07CFC7CFC7CFC7C07CC6AD7AD7AD7AD7AD7AD7AD717AD717AD717AD717AD717EA717EA78676767676767676767676767676767677DA1CB1781426FEF33721B66E727EA7AEF19FB5EDC3E16C6D7F08F04F5067DC9A2D0C001015C1AF848FCD6EEF039C9C5D8E737C0655A97B6BC876854A34AD94FCD29218524C67881BD1A9279EDC12F95720D8A010D9A57DD19A4415BED2687FB462D95DA8436954B5B5B5B5F98935650A1FD7B6B7B8B8B8AD89838B8B8B8C5038B8B8B8AD838B8B8B8C5838B8B8B8B8B8D8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B8B, \“难度\”:“0\”,“暂时\”: \“FEEC6D57F31D8AEE18889026E4E484D96DE6B874013A1932018E809C60C45019033389671DCC2E3138A555705EC95E365D79D3E68A909EFCF15D0D137770131\”, \“家长\”: \“000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\”,“类型”:“块散列”,“散列”:“00000E96C46D15AEAAF9EF6F88A295A8F17207D4CD9AC074D2314680095BEFC854D5A00600602AF2FE03A24B6156CA2D8A6B858B0AF840309AE43923”}


你的正则表达式快到了

正则表达式的问题在于它试图匹配字符串中的所有内容,直到最后一个引号。因此它将一直匹配到
“block\u hash”
。您只需要告诉它延迟匹配,这样当它遇到第一个引号时就会停止匹配

"hash": ".*?" // notice the question mark!
现在这个正则表达式匹配:

"hash": "00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923"
如果您想捕获引号中的内容,我建议您添加一个捕获组:

"hash": "(.*?)"
您可以这样使用此正则表达式:

Pattern pattern = Pattern.compile("\"hash\": \"(.*?)\"");
Matcher matcher = pattern.matcher(yourString);
matcher.find();
System.out.println(matcher.group(1));

不要使用正则表达式。将字符串解析为JSON并从中提取
hash
属性的值。我想这样做,但遗憾的是,在某一点上,文档似乎不再是JSON,因此我无法保证其JSON格式在我更新的代码中正确,我仍然没有得到match@InDaPond我认为正则表达式没有任何问题。尝试调试您的代码,例如打印
iter.next().toString()
的值。如果你仍然不能解决这个问题,试着问另一个问题。iter.next()返回一些内部带有散列的内容,这就是为什么我如此困惑并开始提问的原因