Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/visual-studio-code/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用ApacheSpark流媒体和JavaAPI从所有人那里获取英语推文?_Java_Apache Spark_Twitter - Fatal编程技术网

如何使用ApacheSpark流媒体和JavaAPI从所有人那里获取英语推文?

如何使用ApacheSpark流媒体和JavaAPI从所有人那里获取英语推文?,java,apache-spark,twitter,Java,Apache Spark,Twitter,你好,我是Spark的新手)我想做一些Spark项目,通过帮助Spark流媒体模块收集和处理来自这个社交网络的推文(用于我的大学研究)。但我有一个小问题,我现在不知道如何只获取英语推文。有人能帮我吗?我试图对已经接收到的数据执行筛选操作,但我在这行有java.lang.NullPointerException:“if(status.getPlace().getCountryCode().equals((us)”)”。但这也是一个糟糕的解决方案。有可能在接收数据之前过滤数据吗?请帮忙,我现在真的不

你好,我是Spark的新手)我想做一些Spark项目,通过帮助Spark流媒体模块收集和处理来自这个社交网络的推文(用于我的大学研究)。但我有一个小问题,我现在不知道如何只获取英语推文。有人能帮我吗?我试图对已经接收到的数据执行筛选操作,但我在这行有java.lang.NullPointerException:“if(status.getPlace().getCountryCode().equals((us)”)”。但这也是一个糟糕的解决方案。有可能在接收数据之前过滤数据吗?请帮忙,我现在真的不知道。我很乐意得到你的提示

package TwitterAnalysis;

import org.apache.spark.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.twitter.*;
import twitter4j.GeoLocation;
import twitter4j.Status;



public class Twitter {

    private static void setTwitterOAuth() {
        System.setProperty("twitter4j.oauth.consumerKey", TwitterOAuthKey.consumerKey);
        System.setProperty("twitter4j.oauth.consumerSecret", TwitterOAuthKey.consumerSecret);
        System.setProperty("twitter4j.oauth.accessToken", TwitterOAuthKey.accessToken);
        System.setProperty("twitter4j.oauth.accessTokenSecret", TwitterOAuthKey.accessTokenSecret);
    }

    public static void main(String [] args) {


        setTwitterOAuth();


        SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitter");
        JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(1000));


        JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc);

        //filtering already received tweets
        JavaDStream<Status> englishTweets=twitterStream.filter(
                new Function <Status, Boolean>(){
                    public Boolean call (Status status){
                        if (status.getPlace().getCountryCode().equals("(us)")){
                            return true;
                        }else
                        {return false;}
                    }
                }
        );


         //Without filter: Output text of all tweets
        JavaDStream<String> statuses = englishTweets.map(
                new Function<Status, String>() {
                    public String call(Status status) { return status.getText(); }
                }
        );




        statuses.print();
        jssc.start();

    }
}
包推特分析;
导入org.apache.spark.*;
导入org.apache.spark.api.java.function.*;
导入org.apache.spark.streaming.*;
导入org.apache.spark.streaming.api.java.*;
导入org.apache.spark.streaming.twitter.*;
导入twitter4j.GeoLocation;
导入twitter4j.状态;
公共类推特{
私有静态void setTwitterOAuth(){
System.setProperty(“twitter4j.oauth.consumerKey”,TwitterOAuthKey.consumerKey);
System.setProperty(“twitter4j.oauth.consumerSecret”,TwitterOAuthKey.consumerSecret);
set属性(“twitter4j.oauth.accessToken”,TwitterOAuthKey.accessToken);
System.setProperty(“twitter4j.oauth.accessTokenSecret”,TwitterOAuthKey.accessTokenSecret);
}
公共静态void main(字符串[]args){
setTwitterOAuth();
SparkConf conf=new SparkConf().setMaster(“local[2]”)。setAppName(“SparkTwitter”);
JavaStreamingContext jssc=新的JavaStreamingContext(conf,新的持续时间(1000));
JavaReceiverInputDStream twitterStream=TwitterUtils.createStream(jssc);
//过滤已收到的推文
JavaDStream englishTweets=twitterStream.filter(
新函数(){
公共布尔调用(状态){
如果(status.getPlace().getCountryCode().equals(“(美国)”){
返回true;
}否则
{返回false;}
}
}
);
//无过滤器:输出所有推文的文本
JavaDStream status=englishTweets.map(
新函数(){
公共字符串调用(状态状态){return Status.getText();}
}
);
statuses.print();
jssc.start();
}
}

以下是答案,我刚刚创建了新的JavaDStream,并为他使用了getLang()。解决方案如下所示:

JavaDStream<Status> enTweetdDStream=twitterStream.filter((status) -> "en".equalsIgnoreCase(status.getLang()));
JavaDStream entweetdstream=twitterStream.filter((状态)->“en”.equalsIgnoreCase(状态.getLang());

是否可以这样做:lang:en关键字或lang:es表示西班牙语,lang:de表示德语等等?