Python 将rdd转换为dataframe:AttributeError:';RDD';对象没有属性';toDF&x27;使用PySpark

Python 将rdd转换为dataframe:AttributeError:';RDD';对象没有属性';toDF&x27;使用PySpark,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我正在尝试使用PySpark将RDD转换为数据帧。下面是我的代码 from pyspark import SparkConf, SparkContext from pyspark.sql.functions import * from pyspark.sql import SparkSession conf = SparkConf().setMaster("local").setAppName("Dataframe_examples") sc = Spa

我正在尝试使用PySpark将RDD转换为数据帧。下面是我的代码

from pyspark import SparkConf, SparkContext
from pyspark.sql.functions import *
from pyspark.sql import SparkSession

conf = SparkConf().setMaster("local").setAppName("Dataframe_examples")
sc = SparkContext(conf=conf)

def parsedLine(line):
    fields = line.split(',')
    movieId = fields[0]
    movieName = fields[1]
    genres = fields[2]
    return movieId, movieName, genres

movies = sc.textFile("file:///home/ajit/ml-25m/movies.csv")
parsedLines = movies.map(parsedLine)
print(parsedLines.count())

dataFrame = parsedLines.toDF(["movieId"])
dataFrame.printSchema()
我正在使用PyCharm IDE运行此代码

我得到了一个错误:

File "/home/ajit/PycharmProjects/pythonProject/Dataframe_examples.py", line 19, in <module>
    dataFrame = parsedLines.toDF(["movieId"])
AttributeError: 'PipelinedRDD' object has no attribute 'toDF'
文件“/home/ajit/PycharmProjects/pythonProject/Dataframe_examples.py”,第19行,在
dataFrame=parsedLines.toDF([“movieId”])
AttributeError:“PipelinedRDD”对象没有属性“toDF”

由于我是新手,请告诉我缺少什么?

通过传递sparkcontext初始化
SparkSession

示例:

from pyspark import SparkConf, SparkContext
from pyspark.sql.functions import *
from pyspark.sql import SparkSession

conf = SparkConf().setMaster("local").setAppName("Dataframe_examples")
sc = SparkContext(conf=conf)

spark = SparkSession(sc)

def parsedLine(line):
    fields = line.split(',')
    movieId = fields[0]
    movieName = fields[1]
    genres = fields[2]
    return movieId, movieName, genres

movies = sc.textFile("file:///home/ajit/ml-25m/movies.csv")

#or using spark.sparkContext
movies = spark.sparkContext.textFile("file:///home/ajit/ml-25m/movies.csv")

parsedLines = movies.map(parsedLine)
print(parsedLines.count())

dataFrame = parsedLines.toDF(["movieId"])
dataFrame.printSchema()

使用
SparkSession
生成RDD数据帧,如下所示:

movies = sc.textFile("file:///home/ajit/ml-25m/movies.csv")
parsedLines = movies.map(parsedLine)
print(parsedLines.count())

spark = SparkSession.builder.getOrCreate()
dataFrame = spark.createDataFrame(parsedLines).toDF(["movieId"])
dataFrame.printSchema()
或者首先使用会话中的spark上下文

spark = SparkSession.builder.master("local").appName("Dataframe_examples").getOrCreate()
sc = spark.sparkContext