Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
spark submit java.lang.IllegalArgumentException:无法从空字符串创建路径_Java_Scala_Apache Spark_Hadoop_Bigdata - Fatal编程技术网

spark submit java.lang.IllegalArgumentException:无法从空字符串创建路径

spark submit java.lang.IllegalArgumentException:无法从空字符串创建路径,java,scala,apache-spark,hadoop,bigdata,Java,Scala,Apache Spark,Hadoop,Bigdata,当我提交spark时,我得到了这个错误。 java.lang.IllegalArgumentException:无法从空字符串创建路径 我使用的是spark 2.4.7版 hadoop版本3.3.0 英特利吉酒店 jdk 8 首先,我得到类未找到的错误,我解决了,现在我得到这个错误 是因为数据集还是其他原因。 链接到数据集 错误: C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark-submit --class org.example.TopViewe

当我提交spark时,我得到了这个错误。 java.lang.IllegalArgumentException:无法从空字符串创建路径 我使用的是spark 2.4.7版 hadoop版本3.3.0 英特利吉酒店 jdk 8 首先,我得到类未找到的错误,我解决了,现在我得到这个错误 是因为数据集还是其他原因。 链接到数据集

错误:

C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark-submit --class org.example.TopViewedCategories --master local C:\Users\Piyush\IdeaProjects\BDA\target\BDA-1.0-SNAPSHOT.jar
Started Processing
21/05/04 06:56:04 INFO SparkContext: Running Spark version 2.4.7
21/05/04 06:56:04 INFO SparkContext: Submitted application: YouTubeDM
21/05/04 06:56:04 INFO SecurityManager: Changing view acls to: Piyush
21/05/04 06:56:04 INFO SecurityManager: Changing modify acls to: Piyush
21/05/04 06:56:04 INFO SecurityManager: Changing view acls groups to:
21/05/04 06:56:04 INFO SecurityManager: Changing modify acls groups to:
21/05/04 06:56:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(Piyush); groups with view permissions: Set(); users  with modify permissions: Set(Piyush); groups with modify permissions: Set()
21/05/04 06:56:04 INFO Utils: Successfully started service 'sparkDriver' on port 63708.
21/05/04 06:56:04 INFO SparkEnv: Registering MapOutputTracker
21/05/04 06:56:04 INFO SparkEnv: Registering BlockManagerMaster
21/05/04 06:56:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/04 06:56:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/04 06:56:04 INFO DiskBlockManager: Created local directory at C:\Users\Piyush\AppData\Local\Temp\blockmgr-9f91b0fe-b655-422e-b0bf-38172b70dff0
21/05/04 06:56:05 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/05/04 06:56:05 INFO SparkEnv: Registering OutputCommitCoordinator
21/05/04 06:56:05 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/05/04 06:56:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://DESKTOP-IBFFKH9:4040
21/05/04 06:56:05 INFO SparkContext: Added JAR file:/C:/Users/Piyush/IdeaProjects/BDA/target/BDA-1.0-SNAPSHOT.jar at spark://DESKTOP-IBFFKH9:63708/jars/BDA-1.0-SNAPSHOT.jar with timestamp 1620091565160
21/05/04 06:56:05 INFO Executor: Starting executor ID driver on host localhost
21/05/04 06:56:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63723.
21/05/04 06:56:05 INFO NettyBlockTransferService: Server created on DESKTOP-IBFFKH9:63723
21/05/04 06:56:05 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/04 06:56:05 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
21/05/04 06:56:05 INFO BlockManagerMasterEndpoint: Registering block manager DESKTOP-IBFFKH9:63723 with 366.3 MB RAM, BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
21/05/04 06:56:05 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
21/05/04 06:56:05 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, DESKTOP-IBFFKH9, 63723, None)
Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
        at org.apache.hadoop.fs.Path.<init>(Path.java:183)
        at org.apache.hadoop.fs.Path.getParent(Path.java:356)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:517)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
        at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:694)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313)
        at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131)
        at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:265)
        at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:162)
        at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:74)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:958)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:957)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1544)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1523)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1523)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
        at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1523)
        at org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala:550)
        at org.apache.spark.api.java.AbstractJavaRDDLike.saveAsTextFile(JavaRDDLike.scala:45)
        at org.example.TopViewedCategories.main(TopViewedCategories.java:46)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/05/04 06:56:06 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2
java.io.IOException: Failed to delete: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2\userFiles-897873ea-324a-432c-85a1-786e5797243a\BDA-1.0-SNAPSHOT.jar
        at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
        at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
        at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)
        at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
        at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
21/05/04 06:56:06 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2\userFiles-897873ea-324a-432c-85a1-786e5797243a
java.io.IOException: Failed to delete: C:\Users\Piyush\AppData\Local\Temp\spark-2bac840b-8170-477d-a9ec-dd5f1f9283c2\userFiles-897873ea-324a-432c-85a1-786e5797243a\BDA-1.0-SNAPSHOT.jar
        at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
        at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1062)
        at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
        at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark提交--class org.example.TopViewedCategories--master本地C:\Users\Piyush\IdeaProjects\BDA\target\BDA-1.0-SNAPSHOT.jar
开始处理
21/05/04 06:56:04信息SparkContext:运行Spark版本2.4.7
21/05/04 06:56:04信息SparkContext:提交的申请:YouTubeDM
21/05/04 06:56:04信息安全管理器:将视图ACL更改为:Piyush
21/05/04 06:56:04信息安全管理器:将修改ACL更改为:Piyush
21/05/04 06:56:04信息安全管理器:将视图ACL组更改为:
21/05/04 06:56:04信息安全管理器:将修改ACL组更改为:
21/05/04 06:56:04信息安全管理器:安全管理器:身份验证已禁用;ui ACL被禁用;具有查看权限的用户:Set(Piyush);具有查看权限的组:Set();具有修改权限的用户:设置(Piyush);具有修改权限的组:Set()
21/05/04 06:56:04信息实用程序:已在端口63708上成功启动服务“sparkDriver”。
21/05/04 06:56:04信息SparkEnv:正在注册MapOutputTracker
21/05/04 06:56:04信息SparkEnv:正在注册BlockManagerMaster
21/05/04 06:56:04信息块管理器MasterEndpoint:使用org.apache.spark.storage.DefaultTopologyMapper获取拓扑信息
21/05/04 06:56:04信息BlockManagerMasterEndpoint:BlockManagerMasterEndpoint向上
21/05/04 06:56:04信息DiskBlockManager:已在C:\Users\Piyush\AppData\local\Temp\blockmgr-9f91b0fe-b655-422e-b0bf-38172b70dff0创建本地目录
21/05/04 06:56:05信息内存存储:内存存储已启动,容量为366.3 MB
21/05/04 06:56:05信息SparkEnv:正在注册OutputCommitCoordinator
21/05/04 06:56:05信息实用程序:已成功启动端口4040上的服务“SparkUI”。
21/05/04 06:56:05信息SparkUI:将SparkUI绑定到0.0.0.0,并从http://DESKTOP-IBFFKH9:4040
21/05/04 06:56:05信息SparkContext:添加的JAR文件:/C:/Users/Piyush/IdeaProjects/BDA/target/BDA-1.0-SNAPSHOT.JAR位于spark://DESKTOP-IBFFKH9:63708/jars/BDA-1.0-SNAPSHOT.jar,时间戳1620091565160
21/05/04 06:56:05信息执行器:正在主机localhost上启动执行器ID驱动程序
21/05/04 06:56:05信息实用程序:已在端口63723上成功启动服务“org.apache.spark.network.netty.NettyBlockTransferService”。
21/05/04 06:56:05信息NettyBlockTransferService:在桌面上创建的服务器-IBFFKH9:63723
21/05/04 06:56:05信息块管理器:使用org.apache.spark.storage.RandomBlockReplicationPolicy作为块复制策略
21/05/04 06:56:05信息BlockManagerMaster:注册BlockManager BlockManagerId(驱动程序,桌面-IBFFKH9,63723,无)
21/05/04 06:56:05信息BlockManagerMasterEndpoint:使用366.3 MB RAM注册BlockManagerGrid的block manager DESKTOP-IBFFKH9:63723(驱动程序,DESKTOP-IBFFKH9,63723,无)
21/05/04 06:56:05信息BlockManagerMaster:Registered BlockManager BlockManagerRid(驱动程序,桌面-IBFFKH9,63723,无)
21/05/04 06:56:05信息块管理器:初始化的块管理器:块管理器RID(驱动程序,桌面-IBFFKH9,63723,无)
线程“main”java.lang.IllegalArgumentException中出现异常:无法从空字符串创建路径
位于org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
位于org.apache.hadoop.fs.Path(Path.java:183)
位于org.apache.hadoop.fs.Path.getParent(Path.java:356)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:517)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
位于org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
位于org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:694)
位于org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313)
位于org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131)
位于org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:265)
位于org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:162)
在org.apache.spark.internal.io.SparkHadoopWriter$.write上(SparkHadoopWriter.scala:74)
在org.apache.spark.rdd.pairddfunctions$$anonfun$saveAshadopDataSet$1.apply$mcV$sp(pairddfunctions.scala:1096)
位于org.apache.spark.rdd.pairddfunctions$$anonfun$saveAsHadoopDataset$1.apply(pairddfunctions.scala:1094)
位于org.apache.spark.rdd.pairddfunctions$$anonfun$saveAsHadoopDataset$1.apply(pairddfunctions.scala:1094)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
位于org.apache.spark.rdd.rdd.withScope(rdd.scala:385)
位于org.apache.spark.rdd.pairddfunctions.saveAsHadoopDataset(pairddfunctions.scala:1094)
位于org.apache.spark.rdd.pairrdfunct
package org.example;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;

import java.util.List;
public class TopViewedCategories {
    public static void main(String[] args) throws Exception {
        long timeElapsed = System.currentTimeMillis();
        System.out.println("Started Processing");
        SparkConf conf = new SparkConf()
                .setMaster("local")
                .setAppName("YouTubeDM");

        JavaSparkContext sc = new JavaSparkContext(conf);
        //Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
        sc.setLogLevel("ERROR");
        JavaRDD<String> mRDD = sc.textFile("C:/Users/Piyush/Desktop/bda/INvideos"); //directory where the files are
         JavaPairRDD<Double,String> sortedRDD = mRDD

// .filter(line -> line.split("\t").length > 6)
                 .mapToPair(
                         line -> {
                                 String[] lineArr = line.split("\t");
                                 String category = lineArr[5];
                                 Double views = Double.parseDouble(lineArr[1]);
                                 Tuple2<Double, Integer> viewsTuple = new Tuple2<>(views, 1);
                                 return new Tuple2<>(category, viewsTuple);
                         })
                 .reduceByKey((x, y) -> new Tuple2<>(x._1 + y._1, x._2 + y._2)) .mapToPair(x -> new Tuple2<>(x._1, (x._2._1 / x._2._2)))
                 .mapToPair(Tuple2::swap)
                 .sortByKey(false);

// .take(10);
        long count = sortedRDD.count();
        List<Tuple2<Double, String>> topTenTuples = sortedRDD.take(10);
        JavaPairRDD<Double, String> topTenRdd = sc.parallelizePairs(topTenTuples);  String output_dir = "C:output/spark/TopViewedCategories";
//remove output directory if already there
        FileSystem fs = FileSystem.get(sc.hadoopConfiguration());
        fs.delete(new Path(output_dir), true); // delete dir, true for recursive
        topTenRdd.saveAsTextFile(output_dir);
        timeElapsed = System.currentTimeMillis() - timeElapsed;
        System.out.println("Done.Time taken (in seconds): " + timeElapsed/1000f); System.out.println("Processed Records: " + count);
        sc.stop();
        sc.close();
    }
}
String output_dir = "C:output/spark/TopViewedCategories";
java.lang.IllegalArgumentException: Can not create a Path from an empty string
    at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)