Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/joomla/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用.str和.split将pandas代码转换为Pyspark_Python_Pandas_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Python 如何使用.str和.split将pandas代码转换为Pyspark

Python 如何使用.str和.split将pandas代码转换为Pyspark,python,pandas,apache-spark,pyspark,apache-spark-sql,Python,Pandas,Apache Spark,Pyspark,Apache Spark Sql,我使用pandas编写了以下代码: df['last_two'] = df['text'].str[-2:] df['before_hyphen'] = df['text'].str.split('-').str[0] df['new_text'] = df['before_hyphen'].astype(str) + "-" + df['last_two'].astype(str) 但当我在spark数据帧上运行它时,我得到以下错误: TypeError:startPos

我使用pandas编写了以下代码:

df['last_two'] = df['text'].str[-2:]
df['before_hyphen'] = df['text'].str.split('-').str[0]
df['new_text'] = df['before_hyphen'].astype(str) + "-" + df['last_two'].astype(str)
但当我在spark数据帧上运行它时,我得到以下错误:

TypeError:startPos和length必须是相同的类型


我知道我可以将df转换成pandas,运行代码,然后将其转换回spark df,但我想知道是否有更好的方法?谢谢

您可以尝试以下字符串函数:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'last_two', F.expr('substring(text, -2)')
).withColumn(
    'before_hyphen', F.substring_index('text', '-', 1))
).withColumn(
    'new_text', F.concat_ws('-', 'before_hyphen', 'last_two')
)