Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何从另一个Rdd中的日志文件中筛选IP_Java_Apache Spark - Fatal编程技术网

Java 如何从另一个Rdd中的日志文件中筛选IP

Java 如何从另一个Rdd中的日志文件中筛选IP,java,apache-spark,Java,Apache Spark,我正在从访问日志文件获取IP,尝试使用模式,但未获得正确的输出 public class IPcount { public static void main(String[] args) { String IPADDRESS_PATTERN = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"; Pattern pattern = Pa

我正在从访问日志文件获取IP,尝试使用模式,但未获得正确的输出

public class IPcount {
    public static void main(String[] args) {

    String IPADDRESS_PATTERN = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
    Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);
    Matcher matcher = pattern.matcher(t);

    JavaSparkContext sc = new JavaSparkContext("local", "IPcount");
    @SuppressWarnings({ "unused", "serial" })
    JavaRDD<String> lines = sc.textFile("/home/bhaumik/Documents/access_log", 5)
            .flatMap(new FlatMapFunction<String, String>() {

                @Override
                public Iterable<String> call(String t) throws Exception {
                    // TODO Auto-generated method stub
                    return null; //HERE WHAT SHOULD I DO SO THAT I CAN GET IP FILTER FROM THE LOG FILE.
                }
            });
    }
}
公共类IPcount{
公共静态void main(字符串[]args){
字符串IPADDRESS_PATTERN=“(?:(?:25[0-5]| 2[0-4][0-9]|[01]?[0-9][0-9]?)\\){3}(:25[0-5]| 2[0-4][0-9]|[01]?[0-9][0-9]?)”;
Pattern=Pattern.compile(IPADDRESS\u模式);
Matcher-Matcher=pattern.Matcher(t);
JavaSparkContext sc=新的JavaSparkContext(“本地”、“IPcount”);
@抑制警告({“未使用”、“串行”})
JavaRDD lines=sc.textFile(“/home/bhumik/Documents/access_log”,5)
.flatMap(新的flatMap函数(){
@凌驾
公共Iterable调用(字符串t)引发异常{
//TODO自动生成的方法存根
return null;//在这里,我应该做什么才能从日志文件中获取IP筛选器。
}
});
}
}

这里有一个从
JavaRDD
中提取IP的Java方法,假设每行可能包含零个、一个或多个IP:

public JavaRDD<String> getIPs(JavaRDD<String> rdd) {
    final String IPADDRESS_PATTERN = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
    final Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);

    return rdd.flatMap(new FlatMapFunction<String, String>() {
        @Override
        public Iterable<String> call(String t) throws Exception {
            final Matcher matcher = pattern.matcher(t);
            final LinkedList<String> matches = new LinkedList<>();
            while (matcher.find()) {
                matches.add(matcher.group());
            }
            return matches;
        }
    });
}
publicjavarddgetips(JavaRDD-rdd){
最终字符串IPADDRESS_PATTERN=“(?:(?:25[0-5]| 2[0-4][0-9]|[01]?[0-9][0-9]?)\\){3}(:25[0-5]| 2[0-4][0-9]|[01]?[0-9][0-9]?)”;
最终模式=Pattern.compile(IPADDRESS\u模式);
返回rdd.flatMap(新的flatMap函数(){
@凌驾
公共Iterable调用(字符串t)引发异常{
最终匹配器匹配器=模式匹配器(t);
最终LinkedList匹配项=新LinkedList();
while(matcher.find()){
matches.add(matcher.group());
}
返回比赛;
}
});
}

感谢您的帮助和努力@Tzachzohar当有人能给出正确答案时,这怎么可能是离题的??我已经解释清楚了。。这就是人们给我答案的原因