Python 如何在pyspark中进行字符串转换?

Python 如何在pyspark中进行字符串转换?,python,python-2.7,apache-spark,pyspark,udf,Python,Python 2.7,Apache Spark,Pyspark,Udf,我有这样的数据。我想将low列转换为整数。例如,如果它是01:23.0,我希望它是1*60+23=83 如何做到这一点?我尝试了udf,但它引发了Py4JJavaError df = sqlContext.createDataFrame([ ('01:23.0', 'z', 'null'), ('01:23.0', 'z', 'null'), ('01:23.0', 'c', 'null'), ('null', 'null', 'null'),

我有这样的数据。我想将
low
列转换为整数。例如,如果它是
01:23.0
,我希望它是1*60+23=83

如何做到这一点?我尝试了
udf
,但它引发了
Py4JJavaError

df = sqlContext.createDataFrame([
    ('01:23.0', 'z', 'null'), 
    ('01:23.0', 'z', 'null'),  
    ('01:23.0', 'c', 'null'),
    ('null', 'null', 'null'),  
    ('01:24.0', 'null', '4.0')],
    ('low', 'high', 'normal'))

    def min2sec(v):
        if pd.notnull(v):
            return int(v[:2]) * 60 + int(v[3:5])

    udf_min2sec = udf(min2sec, IntegerType())
    df.withColumn('low', udf_min2sec(df['low'])).show() 

您不需要
udf
,您可以使用内置函数来获得预期的输出:

from pyspark.sql.functions import split, col

df.withColumn("test", split(col("low"),":").cast("array<int>")) \
  .withColumn("test", col("test")[0]*60 + col("test")[1]).show()
+-------+----+------+----+
|    low|high|normal|test|
+-------+----+------+----+
|01:23.0|   z|  null|  83|
|01:23.0|   z|  null|  83|
|01:23.0|   c|  null|  83|
|   null|null|  null|null|
|01:24.0|null|   4.0|  84|
+-------+----+------+----+
从pyspark.sql.functions导入拆分,col
df.withColumn(“测试”、拆分(col(“低”)、“:”).cast(“数组”))\
.withColumn(“test”,col(“test”)[0]*60+col(“test”)[1])。show()
+-------+----+------+----+
|低|高|正常|测试|
+-------+----+------+----+
|01:23.0 | z | null | 83|
|01:23.0 | z | null | 83|
|01:23.0 | c | null | 83|
|空|空|空|空|
|01:24.0 |空| 4.0 | 84|
+-------+----+------+----+

您不需要
udf
,您可以使用内置函数来获得预期的输出:

from pyspark.sql.functions import split, col

df.withColumn("test", split(col("low"),":").cast("array<int>")) \
  .withColumn("test", col("test")[0]*60 + col("test")[1]).show()
+-------+----+------+----+
|    low|high|normal|test|
+-------+----+------+----+
|01:23.0|   z|  null|  83|
|01:23.0|   z|  null|  83|
|01:23.0|   c|  null|  83|
|   null|null|  null|null|
|01:24.0|null|   4.0|  84|
+-------+----+------+----+
从pyspark.sql.functions导入拆分,col
df.withColumn(“测试”、拆分(col(“低”)、“:”).cast(“数组”))\
.withColumn(“test”,col(“test”)[0]*60+col(“test”)[1])。show()
+-------+----+------+----+
|低|高|正常|测试|
+-------+----+------+----+
|01:23.0 | z | null | 83|
|01:23.0 | z | null | 83|
|01:23.0 | c | null | 83|
|空|空|空|空|
|01:24.0 |空| 4.0 | 84|
+-------+----+------+----+