PySpark-如何连接字符串前缀0';根据条件,将s转换为另一个字符串列
我有一个如下所示的数据框PySpark-如何连接字符串前缀0';根据条件,将s转换为另一个字符串列,pyspark,pyspark-sql,Pyspark,Pyspark Sql,我有一个如下所示的数据框 |string_code|prefix_string_code| |1234 |001234 | |123 |000123 | |56789 |056789 | 基本上,我想要添加的是尽可能多的“0”,这样列前缀\u string\u code的长度将是6 我所尝试的: df.withColumn('prefix_string_code', when(length(c
|string_code|prefix_string_code|
|1234 |001234 |
|123 |000123 |
|56789 |056789 |
基本上,我想要添加的是尽可能多的“0”,这样列前缀\u string\u code
的长度将是6
我所尝试的:
df.withColumn('prefix_string_code', when(length(col('string_code')) < 6, concat(lit('0' * (6 - length(col('string_code')))), col('string_code'))).otherwise(col('string_code')))
如您所见,如果不是十进制形式,代码实际上可以工作。我如何正确地做到这一点
谢谢 在这种情况下,可以使用lpad函数
>>> import pyspark.sql.functions as F
>>> rdd = sc.parallelize([1234,123,56789,1234567])
>>> data = rdd.map(lambda x: Row(x))
>>> df=spark.createDataFrame(data,['string_code'])
>>> df.show()
+-----------+
|string_code|
+-----------+
| 1234|
| 123|
| 56789|
| 1234567|
+-----------+
>>> df.withColumn('prefix_string_code', F.when(F.length(df['string_code']) < 6 ,F.lpad(df['string_code'],6,'0')).otherwise(df['string_code'])).show()
+-----------+------------------+
|string_code|prefix_string_code|
+-----------+------------------+
| 1234| 001234|
| 123| 000123|
| 56789| 056789|
| 1234567| 1234567|
+-----------+------------------+
>>将pyspark.sql.functions导入为F
>>>rdd=sc.parallelize([1234123567891234567])
>>>data=rdd.map(λx:行(x))
>>>df=spark.createDataFrame(数据,['string\u code'])
>>>df.show()
+-----------+
|字符串编码|
+-----------+
| 1234|
| 123|
| 56789|
| 1234567|
+-----------+
>>>df.withColumn('prefix_string_code',F.when(F.length(df['string_code'])小于6,F.lpad(df['string_code'],6,'0'))。否则(df['string_code'])。show()
+-----------+------------------+
|字符串|前缀(字符串)代码|
+-----------+------------------+
| 1234| 001234|
| 123| 000123|
| 56789| 056789|
| 1234567| 1234567|
+-----------+------------------+
对于这种情况,您可以使用lpad函数
>>> import pyspark.sql.functions as F
>>> rdd = sc.parallelize([1234,123,56789,1234567])
>>> data = rdd.map(lambda x: Row(x))
>>> df=spark.createDataFrame(data,['string_code'])
>>> df.show()
+-----------+
|string_code|
+-----------+
| 1234|
| 123|
| 56789|
| 1234567|
+-----------+
>>> df.withColumn('prefix_string_code', F.when(F.length(df['string_code']) < 6 ,F.lpad(df['string_code'],6,'0')).otherwise(df['string_code'])).show()
+-----------+------------------+
|string_code|prefix_string_code|
+-----------+------------------+
| 1234| 001234|
| 123| 000123|
| 56789| 056789|
| 1234567| 1234567|
+-----------+------------------+
>>将pyspark.sql.functions导入为F
>>>rdd=sc.parallelize([1234123567891234567])
>>>data=rdd.map(λx:行(x))
>>>df=spark.createDataFrame(数据,['string\u code'])
>>>df.show()
+-----------+
|字符串编码|
+-----------+
| 1234|
| 123|
| 56789|
| 1234567|
+-----------+
>>>df.withColumn('prefix_string_code',F.when(F.length(df['string_code'])小于6,F.lpad(df['string_code'],6,'0'))。否则(df['string_code'])。show()
+-----------+------------------+
|字符串|前缀(字符串)代码|
+-----------+------------------+
| 1234| 001234|
| 123| 000123|
| 56789| 056789|
| 1234567| 1234567|
+-----------+------------------+