如何在pyspark中删除列标题中的空白以及如何将字符串日期转换为日期时间格式_Pyspark_Pyspark Sql_Pyspark Dataframes

如何在pyspark中删除列标题中的空白以及如何将字符串日期转换为日期时间格式

pyspark

如何在pyspark中删除列标题中的空白以及如何将字符串日期转换为日期时间格式,pyspark,pyspark-sql,pyspark-dataframes,Pyspark,Pyspark Sql,Pyspark Dataframes,-我是pyspark的新手，我正在尝试删除空白，在我尝试将日期字符串类型转换为未转换的日期时间格式之后，我不会被删除。请帮我怎么做我试过这个： emp=spark.read.csv("Downloads/dataset2/employees.csv",header=True) dd=list(map(lambda x: x.replace(" ",""),emp.columns)) df=emp.toDF(*dd) +----------+---------+-----------+-

-我是pyspark的新手，我正在尝试删除空白，在我尝试将日期字符串类型转换为未转换的日期时间格式之后，我不会被删除。请帮我怎么做

我试过这个：

emp=spark.read.csv("Downloads/dataset2/employees.csv",header=True)
dd=list(map(lambda x: x.replace(" ",""),emp.columns))
df=emp.toDF(*dd)


  +----------+---------+-----------+--------------------+---------------+--------------------+--------------------+--------------------+---------+-------+-----------+--------+----------------+----------+--------------------+--------------------+----------+--------------------+
|EmployeeID| LastName|  FirstName|               Title|TitleOfCourtesy|           BirthDate|            HireDate|             Address|     City| Region| PostalCode| Country|       HomePhone| Extension|               Photo|               Notes| ReportsTo|           PhotoPath|
+----------+---------+-----------+--------------------+---------------+--------------------+--------------------+--------------------+---------+-------+-----------+--------+----------------+----------+--------------------+--------------------+----------+--------------------+
|         1|'Davolio'|    'Nancy'|'Sales Representa...|          'Ms.'|'1948-12-08 00:00...|'1992-05-01 00:00...|'507 - 20th Ave. ...|'Seattle'|   'WA'|    '98122'|   'USA'|'(206) 555-9857'|    '5467'|'0x151C2F00020000...|'Education includ...|         2|'http://accweb/em...|
|         2| 'Fuller'|   'Andrew'|'Vice President S...|          'Dr.'|'1952-02-19 00:00...|'1992-08-14 00:00...|'908 W. Capital Way'| 'Tacoma'|   'WA'|    '98401'|   'USA'|'(206) 555-9482'|    '3457'|'0x151C2F00020000...|'Andrew received ...|      NULL|'http://accweb/em...|
+----------+---------+-----------+--------------------+---------------+--------------------+--------------------+--------------------+---------+-------+-----------+--------+----------------+----------+--------------------+--------------------+----------+--------------------+

之后，尝试此操作，但显示错误：

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *

emp.select("BirthDate").show()
Py4JJavaError: An error occurred while calling o197.select.
: org.apache.spark.sql.AnalysisException: cannot resolve '`BirthDate`' given input columns: [ PhotoPath, EmployeeID,  Photo,  City,  HomePhone,  ReportsTo,  PostalCode,  Title,  Address, Notes,  LastName,   FirstName,  HireDate,  Region,  Extension,  Country,  BirthDate, TitleOfCourtesy];;

在那之后，我尝试了这个：

df=emp.withColumn('BirthDate', from_unixtime(unix_timestamp('BirthDate','yyyy-mm-dd')))

但它显示空值：

df.select("BirthDate").show(4)
+---------+
|BirthDate|
+---------+
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
+---------+

试试这个

for each in df.columns:
   df = df.withColumnRenamed(each , each.strip())

日期和时间：

df=emp.withColumn('BirthDate', from_unixtime(unix_timestamp('BirthDate','yyyy-mm-dd')))

嗨，Rahul，我试过那个，但它显示空值。使用df=emp.withColumn（'BirthDate'，from_unixtime（unix_timestamp（'BirthDate'，'yyyyy-mm-dd HH:mm:ss'））