spark submit java.lang.IllegalArgumentException:无法从空字符串创建路径
当我提交spark时,我得到了这个错误。 java.lang.IllegalArgumentException:无法从空字符串创建路径 我使用的是spark 2.4.7版 hadoop版本3.3.0 英特利吉酒店 jdk 8 首先,我得到类未找到的错误,我解决了,现在我得到这个错误 是因为数据集还是其他原因。 链接到数据集 错误:spark submit java.lang.IllegalArgumentException:无法从空字符串创建路径,java,scala,apache-spark,hadoop,bigdata,Java,Scala,Apache Spark,Hadoop,Bigdata,当我提交spark时,我得到了这个错误。 java.lang.IllegalArgumentException:无法从空字符串创建路径 我使用的是spark 2.4.7版 hadoop版本3.3.0 英特利吉酒店 jdk 8 首先,我得到类未找到的错误,我解决了,现在我得到这个错误 是因为数据集还是其他原因。 链接到数据集 错误: C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark-submit --class org.example.TopViewe
C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark-submit --class org.example.TopViewedCategories --master local C:\Users\Piyush\IdeaProjects\BDA\target\BDA-1.0-SNAPSHOT.jar
Started Processing
21/05/04 06:56:04 INFO SparkContext: Running Spark version 2.4.7
21/05/04 06:56:04 INFO SparkContext: Submitted application: YouTubeDM
21/05/04 06:56:04 INFO SecurityManager: Changing view acls to: Piyush
21/05/04 06:56:04 INFO SecurityManager: Changing modify acls to: Piyush
21/05/04 06:56:04 INFO SecurityManager: Changing view acls groups to:
21/05/04 06:56:04 INFO SecurityManager: Changing modify acls groups to:
21/05/04 06:56:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Piyush); groups with view permissions: Set(); users with modify permissions: Set(Piyush); groups with modify permissions: Set()
21/05/04 06:56:04 INFO Utils: Successfully started service 'sparkDriver' on port 63708.
21/05/04 06:56:04 INFO SparkEnv: Registering MapOutputTracker
21/05/04 06:56:04 INFO SparkEnv: Registering BlockManagerMaster
21/05/04 06:56:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/04 06:56:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/04 06:56:04 INFO DiskBlockManager: Created local directory at C:\Users\Piyush\AppData\Local\Temp\blockmgr-9f91b0fe-b655-422e-b0bf-38172b70dff0
21/05/04 06:56:05 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/05/04 06:56:05 INFO SparkEnv: Registering OutputCommitCoordinator
21/05/04 06:56:05 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/05/04 06:56:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://DESKTOP-IBFFKH9:4040
21/05/04 06:56:05 INFO SparkContext: Added JAR file:/C:/Users/Piyush/IdeaProjects/BDA/target/BDA-1.0-SNAPSHOT.jar at spark://DESKTOP-IBFFKH9:63708/jars/BDA-1.0-SNAPSHOT.jar with timestamp 1620091565160
21/05/04 06:56:05 INFO Executor: Starting executor ID driver on host localhost
21/05/04 06:56:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63723.
21/05/04 06:56:05 INFO NettyBlockTransferService: Server created on DESKTOP-IBFFKH9:63723
21/05/04 06:56:05 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/04 06:56:05 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
21/05/04 06:56:05 INFO BlockManagerMasterEndpoint: Registering block manager DESKTOP-IBFFKH9:63723 with 366.3 MB RAM, BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
21/05/04 06:56:05 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
21/05/04 06:56:05 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:183)
at org.apache.hadoop.fs.Path.getParent(Path.java:356)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:517)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:694)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313)
at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131)
at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:265)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:162)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:74)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:957)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1544)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1523)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1523)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1523)
at org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:550)
at org.apache.spark.api.java.AbstractJavaRDDLike.saveAsTextFile(JavaRDDLike.scala:45)
at org.example.TopViewedCategories.main(TopViewedCategories.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/05/04 06:56:06 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2
java.io.IOException: Failed to delete: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2\userFiles-897873ea-324a-432c-85a1-786e5797243a\BDA-1.0-SNAPSHOT.jar
at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
21/05/04 06:56:06 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2\userFiles-897873ea-324a-432c-85a1-786e5797243a
java.io.IOException: Failed to delete: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2\userFiles-897873ea-324a-432c-85a1-786e5797243a\BDA-1.0-SNAPSHOT.jar
at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark提交--class org.example.TopViewedCategories--master本地C:\Users\Piyush\IdeaProjects\BDA\target\BDA-1.0-SNAPSHOT.jar
开始处理
21/05/04 06:56:04信息SparkContext:运行Spark版本2.4.7
21/05/04 06:56:04信息SparkContext:提交的申请:YouTubeDM
21/05/04 06:56:04信息安全管理器:将视图ACL更改为:Piyush
21/05/04 06:56:04信息安全管理器:将修改ACL更改为:Piyush
21/05/04 06:56:04信息安全管理器:将视图ACL组更改为:
21/05/04 06:56:04信息安全管理器:将修改ACL组更改为:
21/05/04 06:56:04信息安全管理器:安全管理器:身份验证已禁用;ui ACL被禁用;具有查看权限的用户:Set(Piyush);具有查看权限的组:Set();具有修改权限的用户:设置(Piyush);具有修改权限的组:Set()
21/05/04 06:56:04信息实用程序:已在端口63708上成功启动服务“sparkDriver”。
21/05/04 06:56:04信息SparkEnv:正在注册MapOutputTracker
21/05/04 06:56:04信息SparkEnv:正在注册BlockManagerMaster
21/05/04 06:56:04信息块管理器MasterEndpoint:使用org.apache.spark.storage.DefaultTopologyMapper获取拓扑信息
21/05/04 06:56:04信息BlockManagerMasterEndpoint:BlockManagerMasterEndpoint向上
21/05/04 06:56:04信息DiskBlockManager:已在C:\Users\Piyush\AppData\local\Temp\blockmgr-9f91b0fe-b655-422e-b0bf-38172b70dff0创建本地目录
21/05/04 06:56:05信息内存存储:内存存储已启动,容量为366.3 MB
21/05/04 06:56:05信息SparkEnv:正在注册OutputCommitCoordinator
21/05/04 06:56:05信息实用程序:已成功启动端口4040上的服务“SparkUI”。
21/05/04 06:56:05信息SparkUI:将SparkUI绑定到0.0.0.0,并从http://DESKTOP-IBFFKH9:4040
21/05/04 06:56:05信息SparkContext:添加的JAR文件:/C:/Users/Piyush/IdeaProjects/BDA/target/BDA-1.0-SNAPSHOT.JAR位于spark://DESKTOP-IBFFKH9:63708/jars/BDA-1.0-SNAPSHOT.jar,时间戳1620091565160
21/05/04 06:56:05信息执行器:正在主机localhost上启动执行器ID驱动程序
21/05/04 06:56:05信息实用程序:已在端口63723上成功启动服务“org.apache.spark.network.netty.NettyBlockTransferService”。
21/05/04 06:56:05信息NettyBlockTransferService:在桌面上创建的服务器-IBFFKH9:63723
21/05/04 06:56:05信息块管理器:使用org.apache.spark.storage.RandomBlockReplicationPolicy作为块复制策略
21/05/04 06:56:05信息BlockManagerMaster:注册BlockManager BlockManagerId(驱动程序,桌面-IBFFKH9,63723,无)
21/05/04 06:56:05信息BlockManagerMasterEndpoint:使用366.3 MB RAM注册BlockManagerGrid的block manager DESKTOP-IBFFKH9:63723(驱动程序,DESKTOP-IBFFKH9,63723,无)
21/05/04 06:56:05信息BlockManagerMaster:Registered BlockManager BlockManagerRid(驱动程序,桌面-IBFFKH9,63723,无)
21/05/04 06:56:05信息块管理器:初始化的块管理器:块管理器RID(驱动程序,桌面-IBFFKH9,63723,无)
线程“main”java.lang.IllegalArgumentException中出现异常:无法从空字符串创建路径
位于org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
位于org.apache.hadoop.fs.Path(Path.java:183)
位于org.apache.hadoop.fs.Path.getParent(Path.java:356)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:517)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:694)
位于org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313)
位于org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131)
位于org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:265)
位于org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:162)
在org.apache.spark.internal.io.SparkHadoopWriter$.write上(SparkHadoopWriter.scala:74)
在org.apache.spark.rdd.pairddfunctions$$anonfun$saveAshadopDataSet$1.apply$mcV$sp(pairddfunctions.scala:1096)
位于org.apache.spark.rdd.pairddfunctions$$anonfun$saveAsHadoopDataset$1.apply(pairddfunctions.scala:1094)
位于org.apache.spark.rdd.pairddfunctions$$anonfun$saveAsHadoopDataset$1.apply(pairddfunctions.scala:1094)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
位于org.apache.spark.rdd.rdd.withScope(rdd.scala:385)
位于org.apache.spark.rdd.pairddfunctions.saveAsHadoopDataset(pairddfunctions.scala:1094)
位于org.apache.spark.rdd.pairrdfunct
package org.example;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import java.util.List;
public class TopViewedCategories {
public static void main(String[] args) throws Exception {
long timeElapsed = System.currentTimeMillis();
System.out.println("Started Processing");
SparkConf conf = new SparkConf()
.setMaster("local")
.setAppName("YouTubeDM");
JavaSparkContext sc = new JavaSparkContext(conf);
//Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
sc.setLogLevel("ERROR");
JavaRDD<String> mRDD = sc.textFile("C:/Users/Piyush/Desktop/bda/INvideos"); //directory where the files are
JavaPairRDD<Double,String> sortedRDD = mRDD
// .filter(line -> line.split("\t").length > 6)
.mapToPair(
line -> {
String[] lineArr = line.split("\t");
String category = lineArr[5];
Double views = Double.parseDouble(lineArr[1]);
Tuple2<Double, Integer> viewsTuple = new Tuple2<>(views, 1);
return new Tuple2<>(category, viewsTuple);
})
.reduceByKey((x, y) -> new Tuple2<>(x._1 + y._1, x._2 + y._2)) .mapToPair(x -> new Tuple2<>(x._1, (x._2._1 / x._2._2)))
.mapToPair(Tuple2::swap)
.sortByKey(false);
// .take(10);
long count = sortedRDD.count();
List<Tuple2<Double, String>> topTenTuples = sortedRDD.take(10);
JavaPairRDD<Double, String> topTenRdd = sc.parallelizePairs(topTenTuples); String output_dir = "C:output/spark/TopViewedCategories";
//remove output directory if already there
FileSystem fs = FileSystem.get(sc.hadoopConfiguration());
fs.delete(new Path(output_dir), true); // delete dir, true for recursive
topTenRdd.saveAsTextFile(output_dir);
timeElapsed = System.currentTimeMillis() - timeElapsed;
System.out.println("Done.Time taken (in seconds): " + timeElapsed/1000f); System.out.println("Processed Records: " + count);
sc.stop();
sc.close();
}
}
String output_dir = "C:output/spark/TopViewedCategories";
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)