Java 从字符串中提取目录_Java_Regex_String

Java 从字符串中提取目录

java regex string

Java 从字符串中提取目录,java,regex,string,Java,Regex,String,我需要为字符串提取目录，示例如下： 222.77.201.211 - - [20/Sep/2013:00:10:23 +0800] "GET /mapreduce-nextgen/hadoop-internals-mapreduce-reference/ HTTP/1.1" 200 28664 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)" 220.181.89.164 - - [20/Sep/2013:00:10:2

我需要为字符串提取目录，示例如下：

222.77.201.211 - - [20/Sep/2013:00:10:23 +0800] "GET /mapreduce-nextgen/hadoop-internals-mapreduce-reference/ HTTP/1.1" 200 28664 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
220.181.89.164 - - [20/Sep/2013:00:10:25 +0800] "GET /mapreduce/hadoop-capacity-scheduler HTTP/1.1" 301 390 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
175.44.54.185 - - [20/Sep/2013:00:10:25 +0800] "GET /mapreduce-nextgen/apache-hadoop-2-0-3-published HTTP/1.1" 301 439 "http://dongxicheng.org/mapreduce-nextgen/apache-hadoop-2-0-3-published/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
175.44.54.185 - - [20/Sep/2013:00:10:25 +0800] "GET /search-engine/scribe-intro/ HTTP/1.1" 200 21578 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
112.111.174.38 - - [20/Sep/2013:00:10:30 +0800] "GET /structure/segment-tree HTTP/1.1" 301 414 "http://dongxicheng.org/structure/segment-tree/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
112.111.174.38 - - [20/Sep/2013:00:10:30 +0800] "GET /structure/segment-tree HTTP/1.1" 301 414 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"
222.77.201.211 - - [20/Sep/2013:00:10:31 +0800] "GET /mapreduce-nextgen/apache-hadoop-2-0-3-published/ HTTP/1.1" 200 23438 "http://dongxicheng.org/mapreduce-nextgen/apache-hadoop-2-0-3-published/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"

预期产出将是：

/mapreduce-nextgen/hadoop-internals-mapreduce-reference/

```
/mapreduce/hadoop容量调度器
```

/mapreduce-nextgen/apache-hadoop-2-0-3-published

等等

我认为可能需要一个正则表达式。提前谢谢

如果它总是在

GET

和

HTTP

之间，最简单的正则表达式应该是：

GET (.*?) HTTP

在此证明：

在Java中，代码应如下所示：

Pattern p = Pattern.compile("GET (.*?) HTTP");
Matcher m = p.matcher(string);

编辑：不要忘记将

放在字符串中每个

“

之前，否则它将被解释为字符串的结尾

String str = "222.77.201.211 - - [20/Sep/2013:00:10:23 +0800] \"GET /mapreduce-nextgen/hadoop-internals-mapreduce-reference/ HTTP/1.1\" 200 28664 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)\"";

使用上面的字符串的输出将是

/mapreduce nextgen/hadoop internals mapreduce reference/

好的，因此上面的答案可能会更有效，而且可能更好，但我使用

.indexOf（）

完成了。

String toInspect = "112.111.186.210 - - [20/Sep/2013:00:10:22 +0800] \"GET /structure/segment-tree HTTP/1.1\" 301 414 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)\"";
String directory = StringUtils.substringBetween(toInspect ,"GET ", " HTTP");

本文的第一行内容并不是我在Hadoop处理时是如何做到的，但为了简洁起见，这里是

Text value = "112.111.186.210 - - [20/Sep/2013:00:10:22 +0800] \"GET /structure/segment-tree HTTP/1.1\" 301 414 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)\"","GET ", " HTTP"


     int idx = value.toString().indexOf("GET");
     int idy = value.toString().indexOf("HTTP/1");
     ip.set(value.toString().substring(idx, idy).trim());

您要查找的目录是否总是在“GET”和“HTTP”之间？添加一些stringsYep的其他示例，它将始终采用这种格式。：）注意，这是Apache Commons（AFAIK）的一部分，因此您需要该库。如何转义字符串中的每一行？我正在从文件中读取每一行。谢谢这种方式：

String parts[]=str.split（“\\”；