Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
无法使用Java运行JAR-Spark推特流_Java_Maven_Apache Spark_Spark Streaming_Twitter4j - Fatal编程技术网

无法使用Java运行JAR-Spark推特流

无法使用Java运行JAR-Spark推特流,java,maven,apache-spark,spark-streaming,twitter4j,Java,Maven,Apache Spark,Spark Streaming,Twitter4j,我在Ubuntu中以独立模式运行Spark 2.4.3。我正在使用Maven创建JAR文件。下面是我试图运行的代码,旨在从Twitter流式传输数据。 启动火花后,火花主控点将位于127.0.1.1:7077。 使用的java版本是1.8 package SparkTwitter.SparkJavaTwitter; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import or

我在Ubuntu中以独立模式运行Spark 2.4.3。我正在使用Maven创建JAR文件。下面是我试图运行的代码,旨在从Twitter流式传输数据。 启动火花后,火花主控点将位于127.0.1.1:7077。 使用的java版本是1.8

package SparkTwitter.SparkJavaTwitter;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.twitter.TwitterUtils;

import scala.Tuple2;
import twitter4j.Status;
import twitter4j.auth.Authorization;
import twitter4j.auth.OAuthAuthorization;
import twitter4j.conf.Configuration;
import twitter4j.conf.ConfigurationBuilder;

import com.google.common.collect.Iterables;


public class TwitterStream {

    public static void main(String[] args) {
        // Prepare the spark configuration by setting application name and master node "local" i.e. embedded mode
        final SparkConf sparkConf = new SparkConf().setAppName("Twitter Data Processing").setMaster("local[2]");
        // Create Streaming context using spark configuration and duration for which messages will be batched and fed to Spark Core
        final JavaStreamingContext streamingContext = new JavaStreamingContext(sparkConf, Duration.apply(10000));

        // Prepare configuration for Twitter authentication and authorization
        final Configuration conf = new ConfigurationBuilder().setDebugEnabled(false)
                                        .setOAuthConsumerKey("customer key")
                                        .setOAuthConsumerSecret("customer key secret")
                                        .setOAuthAccessToken("Access token")
                                        .setOAuthAccessTokenSecret("Access token secret")
                                        .build();
        // Create Twitter authorization object by passing prepared configuration containing consumer and access keys and tokens
        final Authorization twitterAuth = new OAuthAuthorization(conf);
        // Create a data stream using streaming context and Twitter authorization
        final JavaReceiverInputDStream<Status> inputDStream = TwitterUtils.createStream(streamingContext, twitterAuth, new String[]{});
        // Create a new stream by filtering the non english tweets from earlier streams
        final JavaDStream<Status> enTweetsDStream = inputDStream.filter((status) -> "en".equalsIgnoreCase(status.getLang()));
        // Convert stream to pair stream with key as user screen name and value as tweet text
        final JavaPairDStream<String, String> userTweetsStream = 
                                enTweetsDStream.mapToPair(
                                    (status) -> new Tuple2<String, String>(status.getUser().getScreenName(), status.getText())
                                );

        // Group the tweets for each user
        final JavaPairDStream<String, Iterable<String>> tweetsReducedByUser = userTweetsStream.groupByKey();
        // Create a new pair stream by replacing iterable of tweets in older pair stream to number of tweets
        final JavaPairDStream<String, Integer> tweetsMappedByUser = tweetsReducedByUser.mapToPair(
                    userTweets -> new Tuple2<String, Integer>(userTweets._1, Iterables.size(userTweets._2))
                );
        // Iterate over the stream's RDDs and print each element on console
        tweetsMappedByUser.foreachRDD((VoidFunction<JavaPairRDD<String, Integer>>)pairRDD -> {
            pairRDD.foreach(new VoidFunction<Tuple2<String,Integer>>() {

                @Override
                public void call(Tuple2<String, Integer> t) throws Exception {
                    System.out.println(t._1() + "," + t._2());
                }

            });
        });
        // Triggers the start of processing. Nothing happens if streaming context is not started
        streamingContext.start();
        // Keeps the processing live by halting here unless terminated manually
        //streamingContext.awaitTermination();

    }

}
下面是我得到的输出

19/11/10 22:17:58 WARN Utils: Your hostname, hadoop-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
19/11/10 22:17:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/11/10 22:17:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Failed to load SparkTwitter.SparkJavaTwitter.TwitterStream: twitter4j/auth/Authorization
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

我用同样的方法运行了一个字数计算程序,效果很好。当我构建JAR时,它也成功地构建了。运行JAR时是否需要指定更多参数?

我遇到了类似的问题,发现您需要直接将JAR提交给spark submit。我要做的是指出用于构建项目的JAR所存储的目录,使用spark submit的
--jars/*“
选项

也许这不是最好的选择,但它是有效的


此外,在更新版本时,请注意该文件夹中的jar也必须更新。

是胖jar吗?@voldy抱歉,我不确定。我在Maven中构建JAR,如下所示。右键单击项目,然后以->Maven build运行。。。当我在Eclipse中运行代码时,它可以工作。但是,一旦我尝试在命令行中运行jar,它就不起作用了。谢谢你的建议。此项目的jar是使用pom.xml添加的。它们不是本地存储的。然而,由于一个特定的类twitter4j.conf.ConfigurationBuilder存在问题,我尝试从Maven下载相同的jar,并在--jars和--packages参数中指定路径,然后重新运行它。但它不起作用。我们可以为jars路径添加一个URL吗?我非常确定,在运行之前,spark系统将下载所需的jar,然后在本地使用它们。您甚至可以查询缓存文件夹(通常是
~/.ivy2
)。所以事实上pom是一组间接URL。你确定你有正确版本的Twitter4J吗?你是对的。我找到了罐子的位置。我的错。我尝试只指定twitter4j jar woth——包,但没有成功。我会再次检查jar版本。但是,当我在Eclipse中作为Java应用程序运行该程序时,它运行良好。问题只是在将程序转换为jar后运行程序时。由于某些原因,spark无法识别您提交的jar中的jar。我相信标准的解决方案是如上所述提供额外的罐子。或者也许有一些我不知道的选择。
./bin/spark-submit --class SparkTwitter.SparkJavaTwitter.TwitterStream /home/hadoop/eclipse-workspace/SparkJavaTwitter/target/SparkJavaTwitter-0.0.1-SNAPSHOT.jar
19/11/10 22:17:58 WARN Utils: Your hostname, hadoop-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
19/11/10 22:17:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/11/10 22:17:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Failed to load SparkTwitter.SparkJavaTwitter.TwitterStream: twitter4j/auth/Authorization
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.