Pyspark 从HDFS读取拼花地板和模式问题_Pyspark_Parquet

Pyspark 从HDFS读取拼花地板和模式问题

pyspark

Pyspark 从HDFS读取拼花地板和模式问题,pyspark,parquet,Pyspark,Parquet,当我尝试从HDFS读取拼花地板文件时，我得到了所有混合情况下的模式。我们能把它转换成小写吗 df=spark.read.parquet(hdfs_location) df.printSchema(); root |-- RecordType: string (nullable = true) |-- InvestmtAccnt: string (nullable = true) |-- InvestmentAccntId: string (nullable = true) |-- Financ

当我尝试从HDFS读取拼花地板文件时，我得到了所有混合情况下的模式。我们能把它转换成小写吗

df=spark.read.parquet(hdfs_location)

df.printSchema();
root
|-- RecordType: string (nullable = true)
|-- InvestmtAccnt: string (nullable = true)
|-- InvestmentAccntId: string (nullable = true)
|-- FinanceSummaryID: string (nullable = true)
|-- BusinDate: string (nullable = true)

What i need is like below


root
|-- recordtype: string (nullable = true)
|-- investmtaccnt: string (nullable = true)
|-- investmentaccntid: string (nullable = true)
|-- financesummaryid: string (nullable = true)
|-- busindate: string (nullable = true)

首先阅读拼花文件

df=spark.read.parquet(hdfs_location)

然后使用

.toDF

函数创建包含所有

较低列名的数据框

df=df.toDF(*[c.lower() for c in df.columns])