Apache spark 尝试使用apache spark在HDFS中保存csv文件时出错
我只是尝试编写一个需要在HDFS中保存csv文件的程序,该代码在eclipse中运行良好,但当我尝试在eclipse外部执行jar时,我遇到了一个错误:Apache spark 尝试使用apache spark在HDFS中保存csv文件时出错,apache-spark,Apache Spark,我只是尝试编写一个需要在HDFS中保存csv文件的程序,该代码在eclipse中运行良好,但当我尝试在eclipse外部执行jar时,我遇到了一个错误: 2014-10-14 12:41:31 INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aroman) Exception in thread "
2014-10-14 12:41:31 INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aroman)
Exception in thread "main" java.lang.ExceptionInInitializerError
at com.tekcomms.c2d.utils.MyWatchService.saveIntoHdfs(MyWatchService.java:362)
at com.tekcomms.c2d.utils.MyWatchService.processDataCastFile(MyWatchService.java:332)
at com.tekcomms.c2d.utils.MyWatchService.processCreateEvent(MyWatchService.java:224)
at com.tekcomms.c2d.utils.MyWatchService.watch(MyWatchService.java:180)
at com.tekcomms.c2d.main.FeedAdaptor.main(FeedAdaptor.java:40)
Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:152)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
at com.tekcomms.c2d.utils.MySparkUtils.<clinit>(MySparkUtils.java:29)
... 5 more
2014-10-14 12:41:31信息安全管理器:58-安全管理器:身份验证禁用;ui ACL被禁用;具有查看权限的用户:Set(aroman)
线程“main”java.lang.ExceptionInInitializeError中出现异常
位于com.tekcomms.c2d.utils.MyWatchService.saveIntoHdfs(MyWatchService.java:362)
位于com.tekcomms.c2d.utils.MyWatchService.processDataCastFile(MyWatchService.java:332)
位于com.tekcomms.c2d.utils.MyWatchService.processCreateEvent(MyWatchService.java:224)
在com.tekcomms.c2d.utils.MyWatchService.watch(MyWatchService.java:180)上
位于com.tekcomms.c2d.main.feedAdapter.main(feedAdapter.java:40)
原因:com.typesafe.config.ConfigException$丢失:找不到键“akka.version”的配置设置
位于com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)上
在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)上
在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)上
在com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)上
位于com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
在akka.actor.ActorSystem$设置中。(ActorSystem.scala:136)
在akka.actor.ActorSystemImpl.(ActorSystem.scala:470)
在akka.actor.ActorSystem$.apply上(ActorSystem.scala:111)
在akka.actor.ActorSystem$.apply上(ActorSystem.scala:104)
位于org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
位于org.apache.spark.SparkEnv$.create(SparkEnv.scala:152)
位于org.apache.spark.SparkContext(SparkContext.scala:202)
位于org.apache.spark.api.java.JavaSparkContext(JavaSparkContext.scala:53)
网址:com.tekcomms.c2d.utils.MySparkUtils。(MySparkUtils.java:29)
... 还有5个
这是负责在HDFS中编写的部分:
public class MySparkUtils {
final static Logger LOGGER = Logger.getLogger(MySparkUtils.class);
private static JavaSparkContext sc;
static {
SparkConf conf = new SparkConf().setAppName("MySparkUtils");
String master = MyWatchService.getSPARK_MASTER();
conf.setMaster(master );
//this is horrible! how can i pass of it?
String [] jars = {"target/feed-adapter-0.0.1-SNAPSHOT.jar"};
conf.setJars(jars );
sc = new JavaSparkContext(conf);
LOGGER.debug("spark context initialized!");
}
public static boolean saveWithinHDFS(String path,StringBuffer sb){
LOGGER.debug("Trying to save in HDFS. path: " + path);
boolean isOk=false;
String [] aStrings = sb.toString().split("\n");
List<String> jsonDatab = Arrays.asList(aStrings);
JavaRDD<String> dataRDD = sc.parallelize(jsonDatab);
dataRDD.saveAsTextFile(path);
return isOk;
}
公共类MySparkUtils{
最终静态记录器=Logger.getLogger(MySparkUtils.class);
私有静态JavaSparkContext sc;
静止的{
SparkConf conf=new SparkConf().setAppName(“MySparkUtils”);
String master=MyWatchService.getSPARK_master();
conf.setMaster(master);
//这太可怕了!我怎么能通过呢?
字符串[]jars={“target/feed-adapter-0.0.1-SNAPSHOT.jar”};
conf.setJars(jars);
sc=新的JavaSparkContext(conf);
debug(“spark上下文已初始化!”);
}
公共静态布尔存储WithInHDFS(字符串路径,StringBuffer sb){
debug(“试图保存在HDFS.path:+path中”);
布尔值isOk=false;
字符串[]aStrings=sb.toString().split(“\n”);
List jsonDatab=Arrays.asList(aStrings);
javarddatardd=sc.parallelize(jsonDatab);
dataRDD.saveAsTextFile(路径);
返回isOk;
}
}
这是我的pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.tekcomms.c2d</groupId>
<artifactId>feed-adapter</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>feed-adaptor</name>
<description>a poc about to scan every second a remote filesystem seeking new csv files from datacast, load the csv file into memory, scan every line of csv matching with a set of pattern rules (matching_phone, matching_mac) if found a match, i will create a string buffer with that previous info, if there is no match, i will create another string buffer with that discarded data. Finally i have to copy those files into HDFS. </description>
<developers>
<developer>
<name>Alonso Isidoro Román</name>
<email>XXX</email>
<timezone>+1 Madrid</timezone>
<organization>XXXX</organization>
<url>about.me/alonso.isidoro.roman</url>
</developer>
</developers>
<dependencies>
<!-- StringUtils... -->
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.0.0</version>
<scope>compile</scope>
<optional>false</optional>
</dependency>
</dependencies>
<repositories>
<repository>
<id>Akka repository</id>
<url>http://repo.akka.io/releases</url>
</repository>
<!-- >repository> <id>cloudera-repos</id> <name>Cloudera Repos</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository -->
<!-- repository> <id>CLOUDERA</id> <url>https://repository.cloudera.com/artifactory/repo/org/apache/spark/spark-core_2.10/0.9.0-cdh5.0.0-beta-2/</url>
</repository> <repository> <id>cdh.repo</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<name>Cloudera Repositories</name> <snapshots> <enabled>false</enabled> </snapshots>
</repository> <repository> <id>cdh.snapshots.repo</id> <url>https://repository.cloudera.com/artifactory/libs-snapshot-local</url>
<name>Cloudera Snapshots Repository</name> <snapshots> <enabled>true</enabled>
</snapshots> <releases> <enabled>false</enabled> </releases> </repository>
<repository> <id>central</id> <url>http://repo1.maven.org/maven2/</url> <releases>
<enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled>
</snapshots> </repository -->
<repository>
<id>cloudera-repos</id>
<name>Cloudera Repos</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.tekcomms.c2d.main.FeedAdaptor</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
4.0.0
com.tekcomms.c2d
馈电适配器
0.0.1-快照
馈电适配器
poc将每秒扫描一个远程文件系统,从数据广播中查找新的csv文件,将csv文件加载到内存中,扫描与一组模式规则(匹配\u phone,匹配\u mac)匹配的每一行csv。如果找到匹配项,我将使用之前的信息创建一个字符串缓冲区,如果没有匹配项,我将用丢弃的数据创建另一个字符串缓冲区。最后,我必须将这些文件复制到HDFS中。
阿隆索·伊西多罗·罗曼
XXX
+1马德里
XXXX
关于.me/alonso.isidoro.roman
公地郎
公地郎
2.6
log4j
log4j
1.2.17
org.apache.spark
spark-core_2.10
1.0.0
编写
错误的
阿克卡存储库
http://repo.akka.io/releases
cloudera回购协议
Cloudera回购协议
https://repository.cloudera.com/artifactory/cloudera-repos/
org.apache.maven.plugins
maven阴影插件
2.3
包裹
阴凉处
com.tekcomms.c2d.main.feedAdapter
*:*
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
我做错了什么
编辑
最后,问题是要找出hdfs集群的确切JAR,错误的版本!,另一个问题是hdfs端的一个非常严格的umask,我的本地用户由于权限的原因无法在hdfs中写入 最后,问题是要找出hdfs集群的确切JAR,错误的版本!,另一个问题是hdfs端的一个非常严格的umask,我的本地用户由于权限的原因无法在hdfs中写入 原因:com.typesafe.config.ConfigException$丢失:找不到键“akka.version”的配置设置嗨,您是否尝试为提要构建胖jar?就像在示例Thank@dirceusemighini中一样,最后我使用另一个依赖项解决了这个问题。无论如何谢谢你@aironman将此解决方案作为对此的答案,因此它可以帮助其他人。