在SPARK python中用空格替换双引号
我正在尝试从文本文件中删除双引号,如:在SPARK python中用空格替换双引号,python,string,apache-spark,replace,Python,String,Apache Spark,Replace,我正在尝试从文本文件中删除双引号,如: in24.inernebr.com[01/Aug/1995:00:00:01]“GET/shutter/missions/sts-68/news/sts-68-mcc-05.txt”200 1839 uplherc.upl.com[01/Aug/1995:00:00:07]“GET/”304 0 uplherc.upl.com[01/Aug/1995:00:08]“GET/images/ksclogo medium.gif”304 0 uplherc.co
in24.inernebr.com[01/Aug/1995:00:00:01]“GET/shutter/missions/sts-68/news/sts-68-mcc-05.txt”200 1839 uplherc.upl.com[01/Aug/1995:00:00:07]“GET/”304 0 uplherc.upl.com[01/Aug/1995:00:08]“GET/images/ksclogo medium.gif”304 0 uplherc.com[01/Aug/1995:00:00:08]“GET/images/MOSAIC small.gif”upllogherc.upl.com[01/Aug/1995:00:00:08]“GET/images/USA logosall.gif”304 0 ix-esc-ca2-07.ix.netcom.com[01/Aug/1995:00:09]“GET/images/launch logo.gif”200 1713 uplherc.upl.com[01/Aug/1995:00:10]“GET/images/WORLD logosall.gif”304 0 slppp6.interndd.net[01/Aug/1995:00:00:10]“GET/history/skylab/skylab/skylab.html”200 1687-WebA4y.com[01/Aug/1995:00:00:10]“GET/images/launchmedium.gif”200 11853 slppp6.internd.net[01/Aug/1995:00:00:11]“GET/history/skylab/skylab small.gif”200 9202
我正在尝试的代码是:
def process_row(row):
row.replace('""', '')
row.split('\t')
nasa = nasa_raw.map(process_row)
for row in nasa.take(10):
print(row)
运行此代码时的结果是:
None None None None None None None None None None
我做错了什么?两件事
您错过了return语句,在replace语句中使用单引号代替双引号
def process_row(row):
return row.replace('"', '')
file = open('filename')
for row in file.readlines():
print(row)
print(process_row(row))
您的函数不返回任何内容,这就是为什么在使用Spark/RDD之前,使用普通Python获得所有非测试代码的原因