Pyspark指定变量的对象类型_Pyspark_Pyspark Dataframes

Pyspark指定变量的对象类型

pyspark

Pyspark指定变量的对象类型,pyspark,pyspark-dataframes,Pyspark,Pyspark Dataframes,我从pyspark中的json文件中读取了以下数据： {"positionmessage":{"callsign": "PPH1", "name": "testschip-10", "mmsi": 100,"timestamplast": "2019-08-01T00:00:08Z"}} {"positionmessage":{"callsign": "PPH2", "name": "testschip-11", "mmsi": 200,"timestamplast": "2019-08-01T

我从pyspark中的json文件中读取了以下数据：

{"positionmessage":{"callsign": "PPH1", "name": "testschip-10", "mmsi": 100,"timestamplast": "2019-08-01T00:00:08Z"}}
{"positionmessage":{"callsign": "PPH2", "name": "testschip-11", "mmsi": 200,"timestamplast": "2019-08-01T00:00:01Z"}}

代码如下所示：

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType, DateType, FloatType, TimestampType

appName = "PySpark Example - JSON file to Spark Data Frame"
master = "local"
# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

schema = StructType([
    StructField("positionmessage",
    StructType([
    StructField('callsign', StringType(), True),
    StructField('name', StringType(), True),
    StructField('timestamplast', TimestampType(), True),    
    StructField('mmsi', IntegerType(), True)
    ]))])

file_name = "data.json"
df = spark.read.json(file_name).select("positionmessage.*")

import pyspark.sql.functions as follows:
df = df.withColumn("name", f.split(df['name'], '\-')[1]).show() # strips the string "testschip-"

现在我想从名称中删除sting testschip。我的工作如下：

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType, DateType, FloatType, TimestampType

appName = "PySpark Example - JSON file to Spark Data Frame"
master = "local"
# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

schema = StructType([
    StructField("positionmessage",
    StructType([
    StructField('callsign', StringType(), True),
    StructField('name', StringType(), True),
    StructField('timestamplast', TimestampType(), True),    
    StructField('mmsi', IntegerType(), True)
    ]))])

file_name = "data.json"
df = spark.read.json(file_name).select("positionmessage.*")

import pyspark.sql.functions as follows:
df = df.withColumn("name", f.split(df['name'], '\-')[1]).show() # strips the string "testschip-"

现在如何使名称成为整数

只需转换为int