Amazon web services 如何配置aws glue作业以使用glue datalake表定义中的列类型?

Amazon web services 如何配置aws glue作业以使用glue datalake表定义中的列类型?,amazon-web-services,apache-spark,amazon-s3,aws-glue,Amazon Web Services,Apache Spark,Amazon S3,Aws Glue,考虑以下aws粘合作业代码: import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import Dyna

考虑以下aws粘合作业代码:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

medicare_dynamicframe = glueContext.create_dynamic_frame.from_catalog(
    database = "my_database",
    table_name = "my_table")
medicare_dynamicframe.printSchema()

job.commit()
它打印类似的内容(请注意,
price\u key
在第二个位置是而不是):

而datalake中的my_表是用
day_键
定义为
int
(第一列)和
price_键
定义为
decimal(25,0)
(第二列)

也许我错了,但我从资料中发现,aws glue只使用表和数据库获取数据的s3路径,而完全忽略任何类型定义。可能适用于某些数据格式,如
parquet
,这是正常的,但不适用于
csv

如何配置aws glue,以从datalake表定义中为具有csv的动态框架设置模式

root
|-- day_key: string
...
|-- price_key: string