Python SparkContext在错误的位置并行化拆分
我下载了一个文件,现在我正试图将其作为数据帧写入hdfsPython SparkContext在错误的位置并行化拆分,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我下载了一个文件,现在我正试图将其作为数据帧写入hdfs import requests from pyspark import SparkContext, SparkConf conf = SparkConf().setAppName('Write Data').setMaster('local') sc = SparkContext(conf=conf) file = requests.get('https://data.nasa.gov/resource/y77d-th95.csv')
import requests
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName('Write Data').setMaster('local')
sc = SparkContext(conf=conf)
file = requests.get('https://data.nasa.gov/resource/y77d-th95.csv')
data = sc.parallelize(file)
打印文件内容时,我会看到以下输出:
print(file.text)
":@computed_region_cbhk_fwbd",":@computed_region_nnqa_25f4","fall","geolocation","geolocation_address","geolocation_city","geolocation_state","geolocation_zip","id","mass","name","nametype","recclass","reclat","reclong","year"
,,"Fell","POINT (6.08333 50.775)",,,,,"1","21","Aachen","Valid","L5","50.775000","6.083330","1880-01-01T00:00:00.000"
,,"Fell","POINT (10.23333 56.18333)",,,,,"2","720","Aarhus","Valid","H6","56.183330","10.233330","1951-01-01T00:00:00.000"
这正是我想看到的。现在,我正试图从使用data=sc.parallelize(文件)
为什么我没有得到第一行像我期待从我的第一次打印?它在中途的某个地方停止了,我没有看到标题的其他组件。它不工作,因为
响应。\uu iter\uuu
不知道格式。它只是
如果您确实需要读取这样的数据,请使用text.splitlines
:
sc.parallelize(file.text.splitlines())
或者更好:
import csv
import io
sc.parallelize(csv.reader(io.StringIO(file.text)))
它不工作,因为
响应。\uuuuu iter\uuuuu
不支持格式。它只是
如果您确实需要读取这样的数据,请使用text.splitlines
:
sc.parallelize(file.text.splitlines())
或者更好:
import csv
import io
sc.parallelize(csv.reader(io.StringIO(file.text)))
答案很简单。要并行化Python对象,您需要提供一个列表来Spark。在这种情况下,您提供的是响应:
>>> file = requests.get('https://data.nasa.gov/resource/y77d-th95.csv')
>>> file
<Response [200]>
当您有一个像Hadoop这样的文件系统时,Hadoop将为您进行拆分,并以一种在换行符上拆分的方式排列HDFS块
希望这有帮助
干杯,福克回答很简单。要并行化Python对象,您需要提供一个列表来Spark。在这种情况下,您提供的是响应:
>>> file = requests.get('https://data.nasa.gov/resource/y77d-th95.csv')
>>> file
<Response [200]>
当您有一个像Hadoop这样的文件系统时,Hadoop将为您进行拆分,并以一种在换行符上拆分的方式排列HDFS块
希望这有帮助
干杯,福克