Python 火花和检查状态下的分离功能内部功能

Python 火花和检查状态下的分离功能内部功能,python,apache-spark,pyspark,bigdata,databricks,Python,Apache Spark,Pyspark,Bigdata,Databricks,我正在使用以下代码 def process_row(row): items = row.replace('"', '') items2 = items.split(' ') for x in items2: items2.append(x.replace('-', '0')) return [string(items[0]), string(items[1]), string(items[2]), string(items[3]), string(items[4])

我正在使用以下代码

def process_row(row):
items = row.replace('"', '')
items2 = items.split(' ')
for x in items2:
  items2.append(x.replace('-', '0'))
return [string(items[0]), string(items[1]), string(items[2]),
        string(items[3]), string(items[4]), int(items[5])]

nasa = (
nasa_raw.map(process_row)
)
        
for row in nasa.take(5):
print(row)
通过文本文件:

in24.inernebr.com[01/Aug/1995:00:00:01]“GET/shutter/missions/sts-68/news/sts-68-mcc-05.txt”200 1839 uplherc.upl.com[01/Aug/1995:00:00:07]“GET/”304 0 uplherc.upl.com[01/Aug/1995:00:08]“GET/images/ksclogo medium.gif”304 0 uplherc.com[01/Aug/1995:00:00:08]“GET/images/MOSAIC small.gif”upllogherc.upl.com[01/Aug/1995:00:00:08]“GET/images/USA logosall.gif”304 0 ix-esc-ca2-07.ix.netcom.com[01/Aug/1995:00:09]“GET/images/launch logo.gif”200 1713 uplherc.upl.com[01/Aug/1995:00:10]“GET/images/WORLD logosall.gif”304 0 slppp6.interndd.net[01/Aug/1995:00:00:10]“GET/history/skylab/skylab/skylab.html”200 1687-WebA4y.com[01/Aug/1995:00:00:10]“GET/images/launchmedium.gif”200 11853 slppp6.internd.net[01/Aug/1995:00:00:11]“GET/history/skylab/skylab small.gif”200 9202

我看到我的replace函数正在工作,引号被替换为空格。拆分函数似乎失败了,因为结果应该是每行一个标记,但这不是我的结果

这里缺少什么?

可以用来替换一行中的最后一个
-

df.select(F.regexp_replace(,“-$”,“0”)).show(truncate=False)