Amazon web services 如何配置aws glue作业以使用glue datalake表定义中的列类型？_Amazon Web Services_Apache Spark_Amazon S3_Aws Glue - Fatal编程技术网

Amazon web services 如何配置aws glue作业以使用glue datalake表定义中的列类型？

amazon-web-services apache-spark amazon-s3

Amazon web services 如何配置aws glue作业以使用glue datalake表定义中的列类型？,amazon-web-services,apache-spark,amazon-s3,aws-glue,Amazon Web Services,Apache Spark,Amazon S3,Aws Glue,考虑以下aws粘合作业代码： import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import Dyna

考虑以下aws粘合作业代码：

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

medicare_dynamicframe = glueContext.create_dynamic_frame.from_catalog(
    database = "my_database",
    table_name = "my_table")
medicare_dynamicframe.printSchema()

job.commit()

它打印类似的内容（请注意，

price\u key

在第二个位置是而不是）：
而datalake中的my_表是用
day_键
定义为
int
（第一列）和
price_键
定义为
decimal（25,0）
（第二列）
也许我错了，但我从资料中发现，aws glue只使用表和数据库获取数据的s3路径，而完全忽略任何类型定义。可能适用于某些数据格式，如
parquet
，这是正常的，但不适用于
csv
如何配置aws glue，以从datalake表定义中为具有csv的动态框架设置模式

root |-- day_key: string ... |-- price_key: string

[apache spark]相关文章推荐

Apache spark Spark使用哪个内存部分来计算不会持久化的RDD apache-spark

Apache spark 拼花地板元数据文件是否需要回滚？ apache-spark

Apache spark saveAspQuetFile在分区和无分区上都失败 apache-spark ibm-cloud

Apache spark 如何使用Hadoop配置文件在Windows上使用SBT构建Spark 1.6.1？ apache-spark build sbt

Apache spark Spark sql如何在循环中为输入数据帧中的每个记录执行sql命令 apache-spark dataframe

Apache spark '；其中'；在apache spark中 apache-spark pyspark

Apache spark 缓存RDD时从另一个spark节点获取块时出错 apache-spark

Apache spark 发送指标：火花到石墨 apache-spark monitoring

Apache spark 从pyspark dataframe检索分区/批 apache-spark dataframe pyspark

Apache spark 如何使用rdd.saveAsPickleFile（输出路径）自动覆盖输出路径中的文件？ apache-spark pyspark

Apache spark apachespark和hadoop之间的Jar冲突 apache-spark hadoop

Apache spark Spark分区：从单节点群集上的本地文件系统加载文件 apache-spark

Apache spark Spark SQL可为空的unicode字符串排名 apache-spark unicode

Apache spark 谓词下推不适用于Spark数据帧中的完全外部联接 apache-spark

Apache spark 每天自动更新配置单元视图 apache-spark hadoop hive

Apache spark 使用Dataproc上的Spark进行跨帐户GCS访问 apache-spark google-cloud-platform google-bigquery google-cloud-storage

Apache spark 将MySQL语句创建为Spark SQL语句 apache-spark pyspark

Apache spark 如何将流式查询结果保存为PDF/XLSX（用于生成报告）？ apache-spark

Apache spark Pyspark datafame.limit（）和drop_duplicates（）提供错误的输出 apache-spark pyspark

Apache spark 如何在spark submit的shell脚本中捕获作业状态 apache-spark airflow

随机文章推荐

Variables 为什么是变量；我"；及；j"；用于柜台？ variables language-agnostic

Variables Mathematica:导出到变量路径 variables loops wolfram-mathematica

Variables 如何检查PL/SQL中的varchar2变量是否与触发器中的10个字符相似？ variables plsql

Variables 批脚本：如何在没有完整路径的情况下获取父目录名？ variables batch-file directory

Variables 如何在powershell脚本/函数中执行任意字符串，就好像该字符串实际上已输入到脚本/函数中一样？ variables powershell

Variables 多框-公式 variables mapping

Variables flash builder 4.6将变量从应用程序传递到项目渲染器 variables

Variables 如果满足媒体查询条件，则更改较少的变量 variables less

Variables 批处理脚本-参数嵌套在变量中？ variables batch-file parameters

Variables DOS批处理文件：在变量中搜索变量 variables search batch-file

Variables 从Jmeter中的变量中删除不必要的符号 variables jmeter

Variables 在媒体查询中重新定义sass变量 variables sass

Variables Drupal 7-从hook_主题获取变量 variables drupal-7

Variables 可以在变量中保存命令吗？ variables autohotkey

Variables 具有动态库存问题的Ansible Tower:“；该任务包括一个带有未定义变量“quot； variables ansible

Variables 临时云变量 variables cloud

Variables 类型播放器和数据播放器之间有什么不同？ variables haskell types

Variables 导出和重用变量（azure devops） variables azure-devops continuous-integration

Variables HtmlWebpackPlugin不'；t将pug变量传递到包含的pug文件 variables webpack

Variables Stata交互变量基准年 variables stata

[amazon web services]相关推荐

Amazon web services AWS弹性beanstalk默认权限策略？
Amazon Web Services Permissions

Amazon web services AWS迁移区域和实例类型
Amazon Web Services Amazon Ec2

Amazon web services 定期从Elast Beanstalk登录到S3
Amazon Web Services

Amazon web services 服务器2012 R2上的负载平衡WCF服务出现问题
Amazon Web Services Wcf

Amazon web services 限制访问AWS上的S3存储桶
Amazon Web Services Amazon Ec2 Amazon S3

Amazon web services AWS弹性豆茎的Dockerfile错误；否则，会有区别吗？
Amazon Web Services Docker

Amazon web services 动态命名CollectionFS文件（AWS S3）
Amazon Web Services Meteor Amazon S3

Amazon web services sed命令在AWS模板上不起作用
Amazon Web Services Sed

Amazon web services 您如何客观地为AWS Lambda选择最佳RAM配置？
Amazon Web Services Aws Lambda

Amazon web services Boto3错误：您提供的AWS访问密钥Id在我们的记录中不存在
Amazon Web Services Amazon S3

Amazon web services 为什么Spot实例（EC2）从已取消更改为已取消？
Amazon Web Services Amazon Ec2 Deep Learning

Amazon web services 如何在AWS上配置只能通过我的pc ip访问的封闭vpc
Amazon Web Services

Amazon web services 编辑云形成模板将终止现有实例并创建新实例
Amazon Web Services Cloud Amazon Cloudformation

Amazon web services Terraform：将CloudFormation模板写入磁盘
Amazon Web Services Amazon Cloudformation Terraform

Amazon web services 在不同帐户中从Codepipeline调用lambda函数时发生JobNotFoundException
Amazon Web Services Aws Lambda

Amazon web services AWS云信息删除资源
Amazon Web Services Amazon Cloudformation

Amazon web services 如果我们在云信息中有错误的条件和Ref，会发生什么？
Amazon Web Services Amazon Cloudformation

Amazon web services 如何在AWS ECS中扩展任务/容器
Amazon Web Services

Amazon web services 使用Boto3从AWS Cognito检索数据
Amazon Web Services

Amazon web services 修改EC2服务角色，使其可以由同一帐户中的IAM用户承担
Amazon Web Services Amazon Ec2

Amazon web services 从笔记本实例查询Athena中的表/数据库
Amazon Web Services Jupyter Notebook

Amazon web services AWS访问密钥在用户上下文中的含义是什么
Amazon Web Services

Amazon web services 在Dynamodb中使用2个索引运行GET查询
Amazon Web Services Amazon Dynamodb

Amazon web services 如何使用JavaSDK删除某些版本的s3对象标记？
Amazon Web Services Amazon S3

Amazon web services 使用API网关从S3下载已经压缩的文件
Amazon Web Services Amazon S3

Amazon web services 如何将本地文件上载到AWS弹性文件系统（EFS）
Amazon Web Services Amazon Ec2

Amazon web services 为什么我看不到Lambda函数的CloudWatch日志？
Amazon Web Services Aws Lambda

Amazon web services AWS Lambda超时时会发生什么情况？
Amazon Web Services Aws Lambda

Amazon web services 弹性负载平衡w.r.t EC2内存
Amazon Web Services Amazon Ec2

Amazon web services AWS云HSM和KMS之间有什么区别？
Amazon Web Services Cloud

Tags

Javafx 2 Ms Word Qt4 Sails.js Certificate Gulp Orientdb Devexpress Asp.net Core Mvc Rx Java Java Me Tabs Twitter Bootstrap 3 Network Programming Log4j Mariadb Apache Storm Intellij Idea Oracle10g Datetime Path Makefile Dependencies Hive Symfony Class Air Angular Dask String Office Js Testng Weblogic C# 3.0 Resharper Flask Node.js Groovy Unix Apache Flink Javafx Plot Nsis Sonarqube Memory Leaks Identityserver4 Internet Explorer Pdf Ag Grid Text Sap Collections Mdx Colors Facebook Vba Google Apps Script Data Structures Directx Formatting Vaadin Playframework 2.0 Osgi Windows Store Apps Parameters Drupal Ibm Midrange Lua C++ Vb6 Variables Phpstorm Google Visualization Navigation Zend Framework Jwt Webgl Iis 7 Memory Networking React Native Quickbooks Hibernate Browser Magento2 Dataframe Ionic Framework Angularjs Ubuntu File Upload Stripe Payments E Commerce Logic Aem Yii Ocaml Kernel Mobile Excel Formula Zsh Xamarin.forms Matlab Logging Neural Network Shell Doxygen Jmeter Azure Sql Database Activemq Python 3.x Compression Apache Flex Vim Binary Bootstrap 4 Webrtc Parallel Processing Listview Jsp Common Lisp Visual Studio Code Discord.js System Verilog Meteor Plsql Virtual Machine Search Corda Google Drive Api Three.js Latex Oauth Mysql Sprite Kit Exception Handling Next.js Soap Lambda Visual Studio 2015 Antlr4 Ios5 Requirejs Asynchronous Safari Post Cobol Grails Entity Framework Core Signalr Web Applications Php Windows Mobile Streaming Cloud Foundry Events Printing Configuration Git Smalltalk Ajax Struts2 Web Button Stanford Nlp Azure Active Directory Version Control Struct Liferay Android Ndk X86 Google Api Ios8 Geolocation Haskell Blockchain Apache Camel Python 2.7 Pentaho Outlook Openerp Web Crawler Elixir Rxjs Linkedin Robotframework For Loop Yaml Z3 Embedded Phantomjs Powershell Couchdb Cakephp User Interface Visual Studio 2010 Spring Boot Ravendb Angular Material Numpy Docusignapi Angular6

Copyright © 2024. All Rights Reserved by - Fatal编程技术网