Amazon s3 S3通过配置单元数据导出到DynamoDB

Amazon s3 S3通过配置单元数据导出到DynamoDB,amazon-s3,hive,amazon-dynamodb,emr,amazon-emr,Amazon S3,Hive,Amazon Dynamodb,Emr,Amazon Emr,我有逗号分隔的.csv,没有列,行如下: ,310795849829453824,AAAAAQ==,Z3JvdXAtY2hhdA==,, ,310795709316075520,AAAAAA==,,, ,310778976203182080,AAAAAQ==,Z3JvdXAtY2hhdA==,, ,310795895400566784,AAAAAA==,,, ,310791016598736896,AAAAAQ==,Z3JvdXAtY2hhdA==,, 它存储在S3中,我定义外部表如下: cr

我有逗号分隔的.csv,没有列,行如下:

,310795849829453824,AAAAAQ==,Z3JvdXAtY2hhdA==,,
,310795709316075520,AAAAAA==,,,
,310778976203182080,AAAAAQ==,Z3JvdXAtY2hhdA==,,
,310795895400566784,AAAAAA==,,,
,310791016598736896,AAAAAQ==,Z3JvdXAtY2hhdA==,,
它存储在S3中,我定义外部表如下:

create external table s3_chats(AVATAR BINARY, CHAT_ID BIGINT, CHAT_TYPE BINARY, NAME BINARY, VENUE_ID BINARY, VENUE_NAME BINARY) row format delimited fields terminated by ',' location 's3://dynamocsv/export/chats/'
这一步一切正常,但当我尝试

INSERT OVERWRITE TABLE dynamo_chats SELECT * FROM s3_chats
引发DynamoDB错误:

-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"avatar":,"chat_id":310795849829453824,"chat_type":^@^@^@^A,"name":group-chat,"venue_id":,"venue_name":}
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"avatar":,"chat_id":310795849829453824,"chat_type":^@^@^@^A,"name":group-chat,"venue_id":,"venue_name":}
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
        ... 8 more
Caused by: java.lang.RuntimeException: com.amazonaws.AmazonServiceException: One or more parameter values were invalid: An AttributeValue may not contain a null or empty binary type. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 1UQEO408OB918GASELHM5V2NB3VV4KQNSO5AEMVJF66Q9ASUAAJG)
        at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.handleException(DynamoDBFibonacciRetryer.java:107)
        at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:83)
        at org.apache.hadoop.dynamodb.DynamoDBClient.writeBatch(DynamoDBClient.java:217)
        at org.apache.hadoop.dynamodb.DynamoDBClient.putBatch(DynamoDBClient.java:167)
        at org.apache.hadoop.dynamodb.write.AbstractDynamoDBRecordWriter.write(AbstractDynamoDBRecordWriter.java:92)
        at org.apache.hadoop.hive.dynamodb.write.HiveDynamoDBRecordWriter.write(HiveDynamoDBRecordWriter.java:29)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:649)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
        ... 9 more
这实际上是有道理的:您不应该指定您不想在请求中写入的行

它是
DynamoDBStorageHandler
bug还是有一些通过Hive编写可选字段的变通方法?

您的主键是“化身二进制、聊天ID二进制、聊天类型二进制、名称二进制、场地ID二进制、场地名称二进制”中的哪一个?主键值不能是可选的。聊天室ID是DynamoDB中的哈希(“主键”)。