Apache spark 尝试使用apache spark在HDFS中保存csv文件时出错

Apache spark 尝试使用apache spark在HDFS中保存csv文件时出错,apache-spark,Apache Spark,我只是尝试编写一个需要在HDFS中保存csv文件的程序,该代码在eclipse中运行良好,但当我尝试在eclipse外部执行jar时,我遇到了一个错误: 2014-10-14 12:41:31 INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aroman) Exception in thread "

我只是尝试编写一个需要在HDFS中保存csv文件的程序,该代码在eclipse中运行良好,但当我尝试在eclipse外部执行jar时,我遇到了一个错误:

2014-10-14 12:41:31 INFO  SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aroman)
    Exception in thread "main" java.lang.ExceptionInInitializerError
    at com.tekcomms.c2d.utils.MyWatchService.saveIntoHdfs(MyWatchService.java:362)
    at com.tekcomms.c2d.utils.MyWatchService.processDataCastFile(MyWatchService.java:332)
    at com.tekcomms.c2d.utils.MyWatchService.processCreateEvent(MyWatchService.java:224)
    at com.tekcomms.c2d.utils.MyWatchService.watch(MyWatchService.java:180)
    at com.tekcomms.c2d.main.FeedAdaptor.main(FeedAdaptor.java:40)
    Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
    at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:152)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
    at com.tekcomms.c2d.utils.MySparkUtils.<clinit>(MySparkUtils.java:29)
    ... 5 more
2014-10-14 12:41:31信息安全管理器:58-安全管理器:身份验证禁用;ui ACL被禁用;具有查看权限的用户:Set(aroman)
线程“main”java.lang.ExceptionInInitializeError中出现异常
位于com.tekcomms.c2d.utils.MyWatchService.saveIntoHdfs(MyWatchService.java:362)
位于com.tekcomms.c2d.utils.MyWatchService.processDataCastFile(MyWatchService.java:332)
位于com.tekcomms.c2d.utils.MyWatchService.processCreateEvent(MyWatchService.java:224)
在com.tekcomms.c2d.utils.MyWatchService.watch(MyWatchService.java:180)上
位于com.tekcomms.c2d.main.feedAdapter.main(feedAdapter.java:40)

原因:com.typesafe.config.ConfigException$丢失:找不到键“akka.version”的配置设置 位于com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115) 在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)上 在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)上 在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)上 在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)上 位于com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197) 在akka.actor.ActorSystem$设置中。(ActorSystem.scala:136) 在akka.actor.ActorSystemImpl.(ActorSystem.scala:470) 在akka.actor.ActorSystem$.apply上(ActorSystem.scala:111) 在akka.actor.ActorSystem$.apply上(ActorSystem.scala:104) 位于org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104) 位于org.apache.spark.SparkEnv$.create(SparkEnv.scala:152) 位于org.apache.spark.SparkContext(SparkContext.scala:202) 位于org.apache.spark.api.java.JavaSparkContext(JavaSparkContext.scala:53) 网址:com.tekcomms.c2d.utils.MySparkUtils。(MySparkUtils.java:29) ... 还有5个
这是负责在HDFS中编写的部分:

public class MySparkUtils {

final static Logger LOGGER = Logger.getLogger(MySparkUtils.class);

private static JavaSparkContext sc;

static {
    SparkConf conf = new SparkConf().setAppName("MySparkUtils");
    String master = MyWatchService.getSPARK_MASTER();
    conf.setMaster(master );
    //this is horrible! how can i pass of it?
    String [] jars = {"target/feed-adapter-0.0.1-SNAPSHOT.jar"};
    conf.setJars(jars );
    sc = new JavaSparkContext(conf);
    LOGGER.debug("spark context initialized!");
}

public static boolean saveWithinHDFS(String path,StringBuffer sb){
    LOGGER.debug("Trying to save in HDFS. path: " + path);
    boolean isOk=false;

    String [] aStrings = sb.toString().split("\n");
    List<String> jsonDatab = Arrays.asList(aStrings);

    JavaRDD<String> dataRDD = sc.parallelize(jsonDatab);
    dataRDD.saveAsTextFile(path);
    return isOk;
}
公共类MySparkUtils{
最终静态记录器=Logger.getLogger(MySparkUtils.class);
私有静态JavaSparkContext sc;
静止的{
SparkConf conf=new SparkConf().setAppName(“MySparkUtils”);
String master=MyWatchService.getSPARK_master();
conf.setMaster(master);
//这太可怕了!我怎么能通过呢?
字符串[]jars={“target/feed-adapter-0.0.1-SNAPSHOT.jar”};
conf.setJars(jars);
sc=新的JavaSparkContext(conf);
debug(“spark上下文已初始化!”);
}
公共静态布尔存储WithInHDFS(字符串路径,StringBuffer sb){
debug(“试图保存在HDFS.path:+path中”);
布尔值isOk=false;
字符串[]aStrings=sb.toString().split(“\n”);
List jsonDatab=Arrays.asList(aStrings);
javarddatardd=sc.parallelize(jsonDatab);
dataRDD.saveAsTextFile(路径);
返回isOk;
}
}

这是我的pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.tekcomms.c2d</groupId>
<artifactId>feed-adapter</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>feed-adaptor</name>
<description>a poc about to scan every second a remote filesystem seeking new csv files from datacast, load the csv file into memory, scan every line of csv matching with a set of pattern rules (matching_phone, matching_mac) if found a match, i will create a string buffer with that previous info, if there is no match, i will create another string buffer with that discarded data. Finally i have to copy those files into HDFS.   </description>
<developers>
    <developer>
        <name>Alonso Isidoro Román</name>
        <email>XXX</email>
        <timezone>+1 Madrid</timezone>
        <organization>XXXX</organization>
        <url>about.me/alonso.isidoro.roman</url>
    </developer>
</developers>

<dependencies>
    <!-- StringUtils... -->
    <dependency>
        <groupId>commons-lang</groupId>
        <artifactId>commons-lang</artifactId>
        <version>2.6</version>
    </dependency>

    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
    <dependency> <!-- Spark dependency -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.0.0</version>
        <scope>compile</scope>
        <optional>false</optional>
    </dependency>
</dependencies>
<repositories>
    <repository>
        <id>Akka repository</id>
        <url>http://repo.akka.io/releases</url>
    </repository>

    <!-- >repository> <id>cloudera-repos</id> <name>Cloudera Repos</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> 
        </repository -->

    <!-- repository> <id>CLOUDERA</id> <url>https://repository.cloudera.com/artifactory/repo/org/apache/spark/spark-core_2.10/0.9.0-cdh5.0.0-beta-2/</url> 
        </repository> <repository> <id>cdh.repo</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> 
        <name>Cloudera Repositories</name> <snapshots> <enabled>false</enabled> </snapshots> 
        </repository> <repository> <id>cdh.snapshots.repo</id> <url>https://repository.cloudera.com/artifactory/libs-snapshot-local</url> 
        <name>Cloudera Snapshots Repository</name> <snapshots> <enabled>true</enabled> 
        </snapshots> <releases> <enabled>false</enabled> </releases> </repository> 
        <repository> <id>central</id> <url>http://repo1.maven.org/maven2/</url> <releases> 
        <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> 
        </snapshots> </repository -->

    <repository>
        <id>cloudera-repos</id>
        <name>Cloudera Repos</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

</repositories>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer
                                implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.tekcomms.c2d.main.FeedAdaptor</mainClass>
                            </transformer>
                        </transformers>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

4.0.0
com.tekcomms.c2d
馈电适配器
0.0.1-快照
馈电适配器
poc将每秒扫描一个远程文件系统,从数据广播中查找新的csv文件,将csv文件加载到内存中,扫描与一组模式规则(匹配\u phone,匹配\u mac)匹配的每一行csv。如果找到匹配项,我将使用之前的信息创建一个字符串缓冲区,如果没有匹配项,我将用丢弃的数据创建另一个字符串缓冲区。最后,我必须将这些文件复制到HDFS中。
阿隆索·伊西多罗·罗曼
XXX
+1马德里
XXXX
关于.me/alonso.isidoro.roman
公地郎
公地郎
2.6
log4j
log4j
1.2.17
org.apache.spark
spark-core_2.10
1.0.0
编写
错误的
阿克卡存储库
http://repo.akka.io/releases
cloudera回购协议
Cloudera回购协议
https://repository.cloudera.com/artifactory/cloudera-repos/
org.apache.maven.plugins
maven阴影插件
2.3
包裹
阴凉处
com.tekcomms.c2d.main.feedAdapter
*:*
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA

我做错了什么

编辑


最后,问题是要找出hdfs集群的确切JAR,错误的版本!,另一个问题是hdfs端的一个非常严格的umask,我的本地用户由于权限的原因无法在hdfs中写入

最后,问题是要找出hdfs集群的确切JAR,错误的版本!,另一个问题是hdfs端的一个非常严格的umask,我的本地用户由于权限的原因无法在hdfs中写入

原因:com.typesafe.config.ConfigException$丢失:找不到键“akka.version”的配置设置嗨,您是否尝试为提要构建胖jar?就像在示例Thank@dirceusemighini中一样,最后我使用另一个依赖项解决了这个问题。无论如何谢谢你@aironman将此解决方案作为对此的答案,因此它可以帮助其他人。