Forms sqoop导入自由格式查询错误

Forms sqoop导入自由格式查询错误,forms,import,sqoop,Forms,Import,Sqoop,我注意到sqoop导入有一个奇怪的问题。我尝试导入的数据在MySQL数据库中的格式如下: <a1, a2, a3, d1, a4, a5, a6, a7, a8> 这个很好用。由于MySQL视图并不像人们希望的那样优化,所以我想使用原始SQL来查看它是否能提高性能。为了测试这一点,我使用了自由形式查询: sqoop import --connect [jdbc url] --username [user] --password [password] --query "SELECT

我注意到sqoop导入有一个奇怪的问题。我尝试导入的数据在MySQL数据库中的格式如下:

<a1, a2, a3, d1, a4, a5, a6, a7, a8>
这个很好用。由于MySQL视图并不像人们希望的那样优化,所以我想使用原始SQL来查看它是否能提高性能。为了测试这一点,我使用了自由形式查询:

sqoop import --connect [jdbc url] --username [user] --password [password] --query "SELECT t1.a1, t2.a2....... from table t1 INNER JOIN table t2 ON t1.t2_id = t2.id ............ WHERE <some condition> AND \$CONDITIONS" --target-dir my_dir --split-by a5 --mysql-delimiters --verbose --boundary-query 'SELECT min(a5), max(a5) from t5'
据我所知,sqoop试图将其他一些列(a3)值解释为时间戳,但转换失败,因为它只是一个字符串而不是日期类型。我还应该提到,我们的一些数据是不好的-我们在一些字段中有换行符和制表符,它们不应该是,但日期字段确实有有效值-我甚至尝试在MySQL中使用REPLACE函数来删除这些数据,但没有用

假设数据是相同的,并且在这两种情况下都使用相同的SELECT语句,我希望结果是相同的(即,SELECT返回的相同数量的记录被导入HDFS)


以前有人见过这种行为吗?关于如何解决这个问题,有什么想法吗?

我尝试用不同的MySQL驱动程序版本执行相同的命令-虽然在所有情况下都发生了相同的错误,但这次的信息更清楚了一些:

13/10/21 22:19:18 INFO mapred.JobClient: Task Id : attempt_201309130032_0308_m_000000_0,   Status : FAILED
java.io.IOException: SQLException in nextKeyValue
    at   org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
    at  org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.sql.SQLException: Cannot convert value '0000-00-00 00:00:00' from column 7 to TIMESTAMP.
    at com.mysql.jdbc.ResultSet.getTimestampFromBytes(ResultSet.java:6886)
    at com.mysql.jdbc.ResultSet.getTimestampInternal(ResultSet.java:6921)
    at com.mysql.jdbc.ResultSet.getTimestamp(ResultSet.java:6245)
    at org.apache.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:111)
    at  com.cloudera.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:83)
    at QueryResult.readFields(QueryResult.java:156)
    at org.apache.sqoop.mapreduce.db
因此,基本问题是'0000-00-00 00:00:00'日期的值存储在我们的数据库中,但无法由驱动程序处理(我已经尝试了几个版本,但都不起作用)。在sqoop中使用原始sql with free form query选项时,驱动程序尝试将此日期转换为日期对象,但失败导致上述错误。请注意,如果使用视图提取相同的日期值,则不会发生这种情况-在这种情况下,驱动程序似乎不会尝试将此值转换为日期对象。无论出于何种原因,MySQL驱动程序和服务器似乎在处理无效日期时不同步

发件人:


我们的旧版db服务器禁用了严格模式,每当旧版应用程序尝试插入无效日期(如“2004-04-31”)时,它将转换为“0000-00-00”,并且驱动程序无法处理上述原始sql情况。一旦使用where子句中的筛选器删除了这些记录,sqoop导入将按预期工作。

我尝试使用不同的MySQL驱动程序版本执行相同的命令-虽然在所有情况下都发生了相同的错误,但这次的信息更清楚一些:

13/10/21 22:19:18 INFO mapred.JobClient: Task Id : attempt_201309130032_0308_m_000000_0,   Status : FAILED
java.io.IOException: SQLException in nextKeyValue
    at   org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
    at  org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.sql.SQLException: Cannot convert value '0000-00-00 00:00:00' from column 7 to TIMESTAMP.
    at com.mysql.jdbc.ResultSet.getTimestampFromBytes(ResultSet.java:6886)
    at com.mysql.jdbc.ResultSet.getTimestampInternal(ResultSet.java:6921)
    at com.mysql.jdbc.ResultSet.getTimestamp(ResultSet.java:6245)
    at org.apache.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:111)
    at  com.cloudera.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:83)
    at QueryResult.readFields(QueryResult.java:156)
    at org.apache.sqoop.mapreduce.db
因此,基本问题是'0000-00-00 00:00:00'日期的值存储在我们的数据库中,但无法由驱动程序处理(我已经尝试了几个版本,但都不起作用)。在sqoop中使用原始sql with free form query选项时,驱动程序尝试将此日期转换为日期对象,但失败导致上述错误。请注意,如果使用视图提取相同的日期值,则不会发生这种情况-在这种情况下,驱动程序似乎不会尝试将此值转换为日期对象。无论出于何种原因,MySQL驱动程序和服务器似乎在处理无效日期时不同步

发件人:


我们的旧版db服务器禁用了严格模式,每当旧版应用程序尝试插入无效日期(如“2004-04-31”)时,它将转换为“0000-00-00”,并且驱动程序无法处理上述原始sql情况。一旦使用where子句中的筛选器删除了这些记录,sqoop导入将按预期工作。

您可以在sqoop命令中使用此JDBC URL

jdbc:mysql://yourserver:3306/yourdatabase?zeroDateTimeBehavior=convertToNull


这对我很有用

您可以在sqoop命令中使用这个JDBC URL

jdbc:mysql://yourserver:3306/yourdatabase?zeroDateTimeBehavior=convertToNull


这对我来说很有用

当你回答自己的问题时,你应该提供答案,然后接受自己的答案。(). 这样,您的问题将被标记为已解决。我不知道我可以回答自己的问题。谢谢你的指点。当你回答你自己的问题时,你应该提供它作为答案,然后接受你自己的答案。(). 这样,您的问题将被标记为已解决。我不知道我可以回答自己的问题。谢谢你的指点。
sqoop import --connect [jdbc url] --username [user] --password [password] --query "SELECT t1.a1, t2.a2....... from table t1 INNER JOIN table t2 ON t1.t2_id = t2.id ............ WHERE <some condition> AND \$CONDITIONS" --target-dir my_dir --split-by a5 --mysql-delimiters --verbose --boundary-query 'SELECT min(a5), max(a5) from t5'
13/09/27 20:28:10 INFO mapred.JobClient: Task Id : attempt_201309130032_0122_m_000000_2,  Status : FAILED
java.io.IOException: SQLException in nextKeyValue
    at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.sql.SQLException: Value 'xxxxxx' can not be represented as     java.sql.Timestamp
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920)
    at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1102)
    at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:576)
    at com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6592)
    at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:6192)
    at org.apache.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:111)
    at com.cloudera.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:83)
    at QueryResult.readFields(QueryResult.java:156)
    at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:245)
    ... 11 more
13/10/21 22:19:18 INFO mapred.JobClient: Task Id : attempt_201309130032_0308_m_000000_0,   Status : FAILED
java.io.IOException: SQLException in nextKeyValue
    at   org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
    at  org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.sql.SQLException: Cannot convert value '0000-00-00 00:00:00' from column 7 to TIMESTAMP.
    at com.mysql.jdbc.ResultSet.getTimestampFromBytes(ResultSet.java:6886)
    at com.mysql.jdbc.ResultSet.getTimestampInternal(ResultSet.java:6921)
    at com.mysql.jdbc.ResultSet.getTimestamp(ResultSet.java:6245)
    at org.apache.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:111)
    at  com.cloudera.sqoop.lib.JdbcWritableBridge.readTimestamp(JdbcWritableBridge.java:83)
    at QueryResult.readFields(QueryResult.java:156)
    at org.apache.sqoop.mapreduce.db
As of 5.0.2, the server requires that month and day values be legal, and not merely in   the range 1 to 12 and 1 to 31, respectively. With strict mode disabled, invalid dates such   as '2004-04-31' are converted to '0000-00-00' and a warning is generated. With strict mode   enabled, invalid dates generate an error.