Python PySpark错误：StructType无法接受类型<；中的对象0；类型'；int'&燃气轮机；_Python_Apache Spark_Pyspark_Pyspark Dataframes

Python PySpark错误：StructType无法接受类型<；中的对象0；类型'；int'&燃气轮机；

python apache-spark pyspark

Python PySpark错误：StructType无法接受类型<；中的对象0；类型'；int'&燃气轮机；,python,apache-spark,pyspark,pyspark-dataframes,Python,Apache Spark,Pyspark,Pyspark Dataframes,我的数据文件与图形边相关。每行的格式为（src node和dest node）。此id是我的架构定义。 eschema=StructType（[StructField（“src”，StringType（），True），StructField（“dst”，StringType（），True）]）我试着读取该行，用定界符（“，”）将其拆分，并将每个元素转换为int。但不知怎的，这失败了 lines = sc.textFile(filename) lines = lines.map(lambda

我的数据文件与图形边相关。每行的格式为（src node和dest node）。此id是我的架构定义。

eschema=StructType（[StructField（“src”，StringType（），True），StructField（“dst”，StringType（），True）]）

我试着读取该行，用定界符（“，”）将其拆分，并将每个元素转换为int。但不知怎的，这失败了

 lines = sc.textFile(filename)
 lines = lines.map(lambda l : map(int, l.split(delim)))
 lines = lines.map(lambda l : Row(l[0], l[1]))

运行这个程序时，我得到了一个错误

StructType无法接受类型中的对象0

我使用的是Python2.7，Spark>2.0。拆分行之后，对象的类型是Unicode而不是string，这会有什么区别吗。如何解决这个问题。任何建议都会大有帮助。谢谢您

如果分隔符为“”，那么它只是一个常规的csv文件。由于您使用的是Spark>2.0，您可以使用现代数据帧api；您可以使用spark会话，而不是使用spark上下文（按sc惯例）：

df = spark.read.format("csv")\
    .option("header", "true")\ # if you have a header inside the file, otherwise don't put this line
    .option("schema", eschema)\ 
    .load(filename)

除了通过

.option（“schema”）

提供模式之外，您还可以使用

.option（“inferSchema”，“true”）

通过查看数据来尝试猜测文件结构

嘿，谢谢你的回答。但问题是一些文件是CSV，其他文件以“”作为分隔符。此外，我的本地系统的Spark>2.0。我使用的远程群集有Spark 1.6，无法升级。这些都是错误。在远程云上（Python 2.7，Spark 1.6，Graphframes 0.1.0）：在我的电脑上，StructType为意外的元组0（Python 2.7，Spark 2.4，Graphframes 0.7.0）：StructType无法接受类型为0的对象