Windows win7 pyspark sql utils IllegalArgumentException
我想在pycharm上运行pyspark。我已经连接了所有东西并设置了环境变量。我可以读取sc.textFile,但当我尝试从pyspark.sql读取csv文件时,出现了一些问题 代码如下:Windows win7 pyspark sql utils IllegalArgumentException,windows,apache-spark,pyspark,pyspark-sql,Windows,Apache Spark,Pyspark,Pyspark Sql,我想在pycharm上运行pyspark。我已经连接了所有东西并设置了环境变量。我可以读取sc.textFile,但当我尝试从pyspark.sql读取csv文件时,出现了一些问题 代码如下: import os import sys from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import SQLContext from pyspark.sql import SparkSessi
import os
import sys
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
# Path for spark source folder
os.environ['SPARK_HOME']="E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7"
# Append pyspark to Python Path
sys.path.append("E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7/python")
sys.path.append("E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1.zip")
conf = SparkConf().setAppName('Simple App')
sc = SparkContext("local", "Simple App")
spark = SparkSession.builder.config(conf=SparkConf()).getOrCreate()
accounts_rdd = spark.read.csv('test.csv')
print accounts_rdd.show()
以下是错误:
Traceback (most recent call last):
File "C:/Users/bjlinmanna/PycharmProjects/untitled1/spark.py", line 25, in <module>
accounts_rdd = spark.read.csv('pmec_close_position_order.csv')
File "E:\spark-2.0.0-bin-hadoop2.7\spark-2.0.0-bin-hadoop2.7\python\pyspark\sql\readwriter.py", line 363, in csv
return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
File "E:\spark-2.0.0-bin-hadoop2.7\spark-2.0.0-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip\py4j\java_gateway.py", line 933, in __call__
File "E:\spark-2.0.0-bin-hadoop2.7\spark-2.0.0-bin-hadoop2.7\python\pyspark\sql\utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.URISyntaxException: Relative path in absolute URI: file:C:/the/path/to/myfile/spark-warehouse'
设置配置时,请注意文件路径中的“//”。我不知道为什么当我设置'file:C:/the/path/to/myfile'时,它不起作用可能这个链接很有用 简而言之,有配置选项
spark.sql.warehouse.dir
来设置仓库文件夹。如果手动设置仓库文件夹,错误消息将消失
我今天遇到了同样的问题。我在ubuntu 16.04中没有问题,但是当我在windows 10中运行相同的代码时,spark会像你一样显示错误消息。可能spark无法在windows中找到或正确创建仓库文件夹。谢谢!这对我帮助很大!我的正确代码如上所示
spark = SparkSession.builder\
.master('local[*]')\
.appName('My App')\
.config('spark.sql.warehouse.dir', 'file:///C:/the/path/to/myfile')\
.getOrCreate()
accounts_rdd = spark.read\
.format('csv')\
.option('header', 'true')\
.load('file.csv')