Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/windows/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Windows win7 pyspark sql utils IllegalArgumentException_Windows_Apache Spark_Pyspark_Pyspark Sql - Fatal编程技术网

Windows win7 pyspark sql utils IllegalArgumentException

Windows win7 pyspark sql utils IllegalArgumentException,windows,apache-spark,pyspark,pyspark-sql,Windows,Apache Spark,Pyspark,Pyspark Sql,我想在pycharm上运行pyspark。我已经连接了所有东西并设置了环境变量。我可以读取sc.textFile,但当我尝试从pyspark.sql读取csv文件时,出现了一些问题 代码如下: import os import sys from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import SQLContext from pyspark.sql import SparkSessi

我想在pycharm上运行pyspark。我已经连接了所有东西并设置了环境变量。我可以读取sc.textFile,但当我尝试从pyspark.sql读取csv文件时,出现了一些问题

代码如下:

import os
import sys
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession

# Path for spark source folder
os.environ['SPARK_HOME']="E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7"
# Append pyspark  to Python Path
sys.path.append("E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7/python")
sys.path.append("E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1.zip")


conf = SparkConf().setAppName('Simple App')
sc = SparkContext("local", "Simple App")
spark = SparkSession.builder.config(conf=SparkConf()).getOrCreate()


accounts_rdd =  spark.read.csv('test.csv')
print accounts_rdd.show()
以下是错误:

Traceback (most recent call last):
  File "C:/Users/bjlinmanna/PycharmProjects/untitled1/spark.py", line 25, in <module>
    accounts_rdd =  spark.read.csv('pmec_close_position_order.csv')
  File "E:\spark-2.0.0-bin-hadoop2.7\spark-2.0.0-bin-hadoop2.7\python\pyspark\sql\readwriter.py", line 363, in csv
    return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  File "E:\spark-2.0.0-bin-hadoop2.7\spark-2.0.0-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip\py4j\java_gateway.py", line 933, in __call__
  File "E:\spark-2.0.0-bin-hadoop2.7\spark-2.0.0-bin-hadoop2.7\python\pyspark\sql\utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.URISyntaxException: Relative path in absolute URI: file:C:/the/path/to/myfile/spark-warehouse'

设置配置时,请注意文件路径中的“//”。我不知道为什么当我设置'file:C:/the/path/to/myfile'时,它不起作用

可能这个链接很有用

简而言之,有配置选项
spark.sql.warehouse.dir
来设置仓库文件夹。如果手动设置仓库文件夹,错误消息将消失


我今天遇到了同样的问题。我在ubuntu 16.04中没有问题,但是当我在windows 10中运行相同的代码时,spark会像你一样显示错误消息。可能spark无法在windows中找到或正确创建仓库文件夹。

谢谢!这对我帮助很大!我的正确代码如上所示
spark = SparkSession.builder\
    .master('local[*]')\
    .appName('My App')\
    .config('spark.sql.warehouse.dir', 'file:///C:/the/path/to/myfile')\
    .getOrCreate()


accounts_rdd =  spark.read\
    .format('csv')\
    .option('header', 'true')\
    .load('file.csv')