Amazon web services 将DynamoDB导出到us-east-2中的S3 AWS数据管道
我想将dynamodb表备份(稍后导入)到S3。dynamodb表存在于us-east-2中,但这是aws数据管道不支持的区域。AWS文档似乎表明这不应该是个问题,但我似乎无法让数据管道在us-east-2中查找表 这是我的数据管道的导出。当我运行这个程序时,我在查找dynamodb表时会得到一个“ResourceNotFound错误”。如果我临时在运行此数据管道的us-west-2中创建了一个同名表,则该作业可以工作,但会从us-west-2而不是us-east-2中的表中提取数据。有没有办法从配置中指定的区域提取此作业Amazon web services 将DynamoDB导出到us-east-2中的S3 AWS数据管道,amazon-web-services,amazon-s3,amazon-dynamodb,Amazon Web Services,Amazon S3,Amazon Dynamodb,我想将dynamodb表备份(稍后导入)到S3。dynamodb表存在于us-east-2中,但这是aws数据管道不支持的区域。AWS文档似乎表明这不应该是个问题,但我似乎无法让数据管道在us-east-2中查找表 这是我的数据管道的导出。当我运行这个程序时,我在查找dynamodb表时会得到一个“ResourceNotFound错误”。如果我临时在运行此数据管道的us-west-2中创建了一个同名表,则该作业可以工作,但会从us-west-2而不是us-east-2中的表中提取数据。有没有办法
{
"objects": [
{
"readThroughputPercent": "#{myDDBReadThroughputRatio}",
"name": "DDBSourceTable",
"id": "DDBSourceTable",
"type": "DynamoDBDataNode",
"region": "us-east-2",
"tableName": "#{myDDBTableName}"
},
{
"period": "6 Hours",
"name": "Every 6 hours",
"id": "DefaultSchedule",
"type": "Schedule",
"startAt": "FIRST_ACTIVATION_DATE_TIME"
},
{
"bootstrapAction": "s3://us-west-2.elasticmapreduce/bootstrap-actions/configure-hadoop, --yarn-key-value,yarn.nodemanager.resource.memory-mb=11520,--yarn-key-value,yarn.scheduler.maximum-allocation-mb=11520,--yarn-key-value,yarn.scheduler.minimum-allocation-mb=1440,--yarn-key-value,yarn.app.mapreduce.am.resource.mb=2880,--mapred-key-value,mapreduce.map.memory.mb=5760,--mapred-key-value,mapreduce.map.java.opts=-Xmx4608M,--mapred-key-value,mapreduce.reduce.memory.mb=2880,--mapred-key-value,mapreduce.reduce.java.opts=-Xmx2304m,--mapred-key-value,mapreduce.map.speculative=false",
"name": "EmrClusterForBackup",
"coreInstanceCount": "1",
"coreInstanceType": "m3.xlarge",
"amiVersion": "3.9.0",
"masterInstanceType": "m3.xlarge",
"id": "EmrClusterForBackup",
"region": "us-west-2",
"type": "EmrCluster",
"terminateAfter": "1 Hour"
},
{
"directoryPath": "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
"name": "S3BackupLocation",
"id": "S3BackupLocation",
"type": "S3DataNode"
},
{
"output": {
"ref": "S3BackupLocation"
},
"input": {
"ref": "DDBSourceTable"
},
"maximumRetries": "2",
"name": "TableBackupActivity",
"step": "s3://dynamodb-emr-us-west-2/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}",
"id": "TableBackupActivity",
"runsOn": {
"ref": "EmrClusterForBackup"
},
"type": "EmrActivity",
"resizeClusterBeforeRunning": "true"
},
{
"failureAndRerunMode": "CASCADE",
"schedule": {
"ref": "DefaultSchedule"
},
"resourceRole": "data_pipeline_etl_role",
"pipelineLogUri": "s3://MY_S3_BUCKET/",
"role": "data_pipeline_pipeline_role",
"scheduleType": "cron",
"name": "Default",
"id": "Default"
}
],
"parameters": [
{
"description": "Output S3 folder",
"id": "myOutputS3Loc",
"type": "AWS::S3::ObjectKey"
},
{
"description": "Source DynamoDB table name",
"id": "myDDBTableName",
"type": "String"
},
{
"default": "0.25",
"watermark": "Enter value between 0.1-1.0",
"description": "DynamoDB read throughput ratio",
"id": "myDDBReadThroughputRatio",
"type": "Double"
},
{
"default": "us-east-1",
"watermark": "us-east-1",
"description": "Region of the DynamoDB table",
"id": "myDDBRegion",
"type": "String"
}
],
"values": {
"myDDBRegion": "us-east-2",
"myDDBTableName": "prod--users",
"myDDBReadThroughputRatio": "0.25",
"myOutputS3Loc": "s3://MY_S3_BUCKET"
}
}
是一次性的还是你想持续做的事情?您是否可以使用DynamoDB全局表在受支持的区域中复制该表,然后在备份完成后删除该区域 全局表复制是免费的。您只需在复制表启动和运行时支付其容量
这是一个有趣的解决办法。我想知道我试图完成的工作是否可以以一种不太迂回的方式完成。请注意,您将无法使用此方法导出包含数据的现有表,因为表必须为空才能配置为全局表。