Pyspark Spark读取csv模式
我正在使用下面的代码将文件导入dataframe。尽管我已经定义了模式,但它并没有使用我提供的模式。有什么见解吗Pyspark Spark读取csv模式,pyspark,apache-spark-sql,Pyspark,Apache Spark Sql,我正在使用下面的代码将文件导入dataframe。尽管我已经定义了模式,但它并没有使用我提供的模式。有什么见解吗 schema= "row INT, name STRING, age INT, count INT" df = spark.read.format('csv').\ options(schema = schema).\ options(delimiter=',').\ options(header='false').\ load('C:/SparkCourse/f
schema= "row INT, name STRING, age INT, count INT"
df = spark.read.format('csv').\
options(schema = schema).\
options(delimiter=',').\
options(header='false').\
load('C:/SparkCourse/fakefriends.csv')
df.columns
['_c0', '_c1', '_c2', '_c3']
请用这个作为正确的解决方案
from pyspark.sql.session import SparkSession
spark = SparkSession.builder.getOrCreate()
schema = "row INT, name STRING, age INT, count INT"
spark.read.format("csv") \
.schema(schema) \
.options(delimiter=',') \
.options(header=False) \
.load('fakefriends.csv') \
.show(truncate=False)
+---+----+---+-----+
|row|name|age|count|
+---+----+---+-----+
|1 |a |1 |2 |
|2 |b |2 |3 |
|3 |c |3 |4 |
+---+----+---+-----+
您好@Nisha,这段代码抛出错误,因为您使用了option而不是options。请查收。谢谢。谢谢你的更正。当把loc丢在这里的时候,它错过了!!!谢谢,这很有效。因此,据我所知,我需要更改模式(schema)
from pyspark.sql.session import SparkSession
spark = SparkSession.builder.getOrCreate()
schema = "row INT, name STRING, age INT, count INT"
spark.read.format("csv") \
.schema(schema) \
.options(delimiter=',') \
.options(header=False) \
.load('fakefriends.csv') \
.show(truncate=False)
+---+----+---+-----+
|row|name|age|count|
+---+----+---+-----+
|1 |a |1 |2 |
|2 |b |2 |3 |
|3 |c |3 |4 |
+---+----+---+-----+