Apache spark 火花罐';不加载大文件?

Apache spark 火花罐';不加载大文件?,apache-spark,pyspark,Apache Spark,Pyspark,我想加载一个大的csv文件,所以我尝试了pyspark,但是jupyter笔记本返回以下错误: IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_data_ra

我想加载一个大的csv文件,所以我尝试了pyspark,但是jupyter笔记本返回以下错误:

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
这是我的代码:

import findspark
findspark.init()
from pyspark import SparkContext, SparkConf

from pyspark.sql import SparkSession

#readmultiple csv with pyspark
 spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()

 df = spark.read.csv("Desktop/train/train.csv",header=True);

 Pickup_locations=df.select("pickup_datetime","Pickup_latitude",
                          "Pickup_longitude")

 print(Pickup_locations.collect())

这个文件有多大?它有1048576行。我从kaggle下载它。显然问题出在collect()中,jupyter无法显示数据帧的所有数据。当我将collect()更改为count()时,我得到的行数是1458644,对于未处理的原始数据,没有理由使用
collect
,这里使用它的方式;如果您需要
收集
,那么首先为什么要使用Spark呢?事实上,我的主要问题是迭代数据帧以获得每行的纬度和经度,然后将其传递给folium map,因此我计划这样做:对于df.collect()中的行:提取lat和lon