Apache kafka 汇合JDBC源连接在转换中提供NullPointerException
这真的很奇怪。 在我的数据库中,当我执行此SQL时:Apache kafka 汇合JDBC源连接在转换中提供NullPointerException,apache-kafka,apache-kafka-connect,confluent-platform,Apache Kafka,Apache Kafka Connect,Confluent Platform,这真的很奇怪。 在我的数据库中,当我执行此SQL时: 从mySchema.myTable中选择count(*),其中一些列='2' 结果是:26000000 当连接器配置中的查询设置为此时,我运行连接器: QUERY="select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key limit 26000000" QUERY="select * from mySchema.myTable whe
从mySchema.myTable中选择count(*),其中一些列='2'
结果是:26000000
当连接器配置中的查询设置为此时,我运行连接器:
QUERY="select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key limit 26000000"
QUERY="select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key"
连接器工作正常,我可以使用所有消息
但是,当连接器配置中的查询设置为此时,我运行连接器:
QUERY="select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key limit 26000000"
QUERY="select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key"
连接器给了我这个例外:
[2019-12-23 22:51:16,671] ERROR WorkerSourceTask{id=HIVE_JDBC_BATCH_SOURCE-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:293)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:229)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.kafka.connect.transforms.ValueToKey.applyWithSchema(ValueToKey.java:85)
at org.apache.kafka.connect.transforms.ValueToKey.apply(ValueToKey.java:65)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
以下是配置:
[2019-12-23 22:51:00681]信息连接器配置值:
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
value.converter = class org.apache.kafka.connect.json.JsonConverter
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
transforms.createKey.fields = [mySchema.primary_key]
transforms.createKey.type = class org.apache.kafka.connect.transforms.ValueToKey
transforms.extractString.field = mySchema.primary_key
transforms.extractString.type = class org.apache.kafka.connect.transforms.ExtractField$Key
value.converter = class org.apache.kafka.connect.json.JsonConverter
converter.encoding = UTF8
converter.type = key
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
acks = all
batch.size = 100000
bootstrap.servers = xxx
buffer.memory = 33554432
client.dns.lookup = default
client.id =
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 2147483647
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 10
max.block.ms = 9223372036854775807
max.in.flight.requests.per.connection = 1
max.request.size = 10485760
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 310000
retries = 2147483647
retry.backoff.ms = 100
batch.max.rows = 100
catalog.pattern = null
connection.attempts = 5
connection.backoff.ms = 60000
connection.password = null
connection.url = xxx
connection.user = null
db.timezone = UTC
dialect.name =
incrementing.column.name =
mode = bulk
numeric.mapping = null
numeric.precision.mapping = false
poll.interval.ms = 86400000
query = select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key
quote.sql.identifiers = ALWAYS
schema.pattern = mySchema
table.blacklist = []
table.poll.interval.ms = 60000
table.types = [TABLE]
table.whitelist = []
tables = []
timestamp.column.name = []
timestamp.delay.interval.ms = 0
topic.prefix = my_topic
validate.non.null = false
[2019-12-23 22:51:00681]信息丰富的连接器配置值:
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
value.converter = class org.apache.kafka.connect.json.JsonConverter
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
transforms.createKey.fields = [mySchema.primary_key]
transforms.createKey.type = class org.apache.kafka.connect.transforms.ValueToKey
transforms.extractString.field = mySchema.primary_key
transforms.extractString.type = class org.apache.kafka.connect.transforms.ExtractField$Key
value.converter = class org.apache.kafka.connect.json.JsonConverter
converter.encoding = UTF8
converter.type = key
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
acks = all
batch.size = 100000
bootstrap.servers = xxx
buffer.memory = 33554432
client.dns.lookup = default
client.id =
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 2147483647
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 10
max.block.ms = 9223372036854775807
max.in.flight.requests.per.connection = 1
max.request.size = 10485760
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 310000
retries = 2147483647
retry.backoff.ms = 100
batch.max.rows = 100
catalog.pattern = null
connection.attempts = 5
connection.backoff.ms = 60000
connection.password = null
connection.url = xxx
connection.user = null
db.timezone = UTC
dialect.name =
incrementing.column.name =
mode = bulk
numeric.mapping = null
numeric.precision.mapping = false
poll.interval.ms = 86400000
query = select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key
quote.sql.identifiers = ALWAYS
schema.pattern = mySchema
table.blacklist = []
table.poll.interval.ms = 60000
table.types = [TABLE]
table.whitelist = []
tables = []
timestamp.column.name = []
timestamp.delay.interval.ms = 0
topic.prefix = my_topic
validate.non.null = false
[2019-12-23 22:51:00686]信息字符串转换器配置值:
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
value.converter = class org.apache.kafka.connect.json.JsonConverter
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
transforms.createKey.fields = [mySchema.primary_key]
transforms.createKey.type = class org.apache.kafka.connect.transforms.ValueToKey
transforms.extractString.field = mySchema.primary_key
transforms.extractString.type = class org.apache.kafka.connect.transforms.ExtractField$Key
value.converter = class org.apache.kafka.connect.json.JsonConverter
converter.encoding = UTF8
converter.type = key
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
acks = all
batch.size = 100000
bootstrap.servers = xxx
buffer.memory = 33554432
client.dns.lookup = default
client.id =
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 2147483647
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 10
max.block.ms = 9223372036854775807
max.in.flight.requests.per.connection = 1
max.request.size = 10485760
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 310000
retries = 2147483647
retry.backoff.ms = 100
batch.max.rows = 100
catalog.pattern = null
connection.attempts = 5
connection.backoff.ms = 60000
connection.password = null
connection.url = xxx
connection.user = null
db.timezone = UTC
dialect.name =
incrementing.column.name =
mode = bulk
numeric.mapping = null
numeric.precision.mapping = false
poll.interval.ms = 86400000
query = select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key
quote.sql.identifiers = ALWAYS
schema.pattern = mySchema
table.blacklist = []
table.poll.interval.ms = 60000
table.types = [TABLE]
table.whitelist = []
tables = []
timestamp.column.name = []
timestamp.delay.interval.ms = 0
topic.prefix = my_topic
validate.non.null = false
[2019-12-23 22:51:00686]信息JsonConverterConfig值:
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
value.converter = class org.apache.kafka.connect.json.JsonConverter
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
transforms.createKey.fields = [mySchema.primary_key]
transforms.createKey.type = class org.apache.kafka.connect.transforms.ValueToKey
transforms.extractString.field = mySchema.primary_key
transforms.extractString.type = class org.apache.kafka.connect.transforms.ExtractField$Key
value.converter = class org.apache.kafka.connect.json.JsonConverter
converter.encoding = UTF8
converter.type = key
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
acks = all
batch.size = 100000
bootstrap.servers = xxx
buffer.memory = 33554432
client.dns.lookup = default
client.id =
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 2147483647
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 10
max.block.ms = 9223372036854775807
max.in.flight.requests.per.connection = 1
max.request.size = 10485760
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 310000
retries = 2147483647
retry.backoff.ms = 100
batch.max.rows = 100
catalog.pattern = null
connection.attempts = 5
connection.backoff.ms = 60000
connection.password = null
connection.url = xxx
connection.user = null
db.timezone = UTC
dialect.name =
incrementing.column.name =
mode = bulk
numeric.mapping = null
numeric.precision.mapping = false
poll.interval.ms = 86400000
query = select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key
quote.sql.identifiers = ALWAYS
schema.pattern = mySchema
table.blacklist = []
table.poll.interval.ms = 60000
table.types = [TABLE]
table.whitelist = []
tables = []
timestamp.column.name = []
timestamp.delay.interval.ms = 0
topic.prefix = my_topic
validate.non.null = false
[2019-12-23 22:51:00701]信息产品配置值:
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
value.converter = class org.apache.kafka.connect.json.JsonConverter
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
transforms.createKey.fields = [mySchema.primary_key]
transforms.createKey.type = class org.apache.kafka.connect.transforms.ValueToKey
transforms.extractString.field = mySchema.primary_key
transforms.extractString.type = class org.apache.kafka.connect.transforms.ExtractField$Key
value.converter = class org.apache.kafka.connect.json.JsonConverter
converter.encoding = UTF8
converter.type = key
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
acks = all
batch.size = 100000
bootstrap.servers = xxx
buffer.memory = 33554432
client.dns.lookup = default
client.id =
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 2147483647
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 10
max.block.ms = 9223372036854775807
max.in.flight.requests.per.connection = 1
max.request.size = 10485760
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 310000
retries = 2147483647
retry.backoff.ms = 100
batch.max.rows = 100
catalog.pattern = null
connection.attempts = 5
connection.backoff.ms = 60000
connection.password = null
connection.url = xxx
connection.user = null
db.timezone = UTC
dialect.name =
incrementing.column.name =
mode = bulk
numeric.mapping = null
numeric.precision.mapping = false
poll.interval.ms = 86400000
query = select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key
quote.sql.identifiers = ALWAYS
schema.pattern = mySchema
table.blacklist = []
table.poll.interval.ms = 60000
table.types = [TABLE]
table.whitelist = []
tables = []
timestamp.column.name = []
timestamp.delay.interval.ms = 0
topic.prefix = my_topic
validate.non.null = false
[2019-12-23 22:51:00810]信息JdbcSourceTaskConfig值:
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
value.converter = class org.apache.kafka.connect.json.JsonConverter
config.action.reload = restart
connector.class = io.confluent.connect.jdbc.JdbcSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = HIVE_JDBC_BATCH_SOURCE
tasks.max = 8
transforms = [createKey, extractString]
transforms.createKey.fields = [mySchema.primary_key]
transforms.createKey.type = class org.apache.kafka.connect.transforms.ValueToKey
transforms.extractString.field = mySchema.primary_key
transforms.extractString.type = class org.apache.kafka.connect.transforms.ExtractField$Key
value.converter = class org.apache.kafka.connect.json.JsonConverter
converter.encoding = UTF8
converter.type = key
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
acks = all
batch.size = 100000
bootstrap.servers = xxx
buffer.memory = 33554432
client.dns.lookup = default
client.id =
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 2147483647
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 10
max.block.ms = 9223372036854775807
max.in.flight.requests.per.connection = 1
max.request.size = 10485760
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 310000
retries = 2147483647
retry.backoff.ms = 100
batch.max.rows = 100
catalog.pattern = null
connection.attempts = 5
connection.backoff.ms = 60000
connection.password = null
connection.url = xxx
connection.user = null
db.timezone = UTC
dialect.name =
incrementing.column.name =
mode = bulk
numeric.mapping = null
numeric.precision.mapping = false
poll.interval.ms = 86400000
query = select * from mySchema.myTable where some_col = '2' order by primary_key, sec_key
quote.sql.identifiers = ALWAYS
schema.pattern = mySchema
table.blacklist = []
table.poll.interval.ms = 60000
table.types = [TABLE]
table.whitelist = []
tables = []
timestamp.column.name = []
timestamp.delay.interval.ms = 0
topic.prefix = my_topic
validate.non.null = false
DB表中的示例数据:
主钥匙2C58131FF9680D5632CB1FDC27675490
第3EE节
第一年
2011年11月
content_txt 2016-10-072016-10-12MEMOREX1234500172409430291.52
连接器生成的示例消息:
{“mySchema.primary_key”:“2C58131FF9680D5632CB1FDC27675490”,“mySchema.sec_key”:“3EE”,“mySchema.year_cd”:“1”,“mySchema.year_month”:“201911”,“mySchema.content_txt”:“2016-10-072016-10-12MEMOREX1234500172409430291.52”我能够找到bug(IMO,JDBC SourceConnector中的bug) 查询具有“LIMIT子句”时连接器生成的示例消息: {“mySchema.primary_key”:“2C58131FF9680D5632CB1FDC27675490”,“mySchema.sec_key”:“3EE”,“mySchema.year_cd”:“1”,“mySchema.year_month”:“201911”,“mySchema.content_txt”:“2016-10-072016-10-12MEMOREX1234500172409430291.52”} 查询没有“LIMIT子句”时连接器生成的示例消息: {“主键”:“2C58131FF9680D5632CB1FDC27675490”,“副键”:“3EE”,“年cd”:“1”,“年月份”:“201911”,“内容文本”:“2016-10-072016-10-12MEMOREX1234500172409430291.52”} 当设置如下时: transforms.extractString.field=mySchema.primary\u键 连接器将引发NullPointerException,因此我将设置更改为: transforms.extractString.field=主键
它就像一个符咒。查询不是问题所在。如果你看stacktrace,那就是转换。你的记录值中真的有点吗?你能在没有转换的情况下显示示例记录吗?我对此表示怀疑。不过,如果有机会,我会发布一个示例记录(没有转换)。“LIMIT子句”没有错误,当我删除“LIMIT子句”时,我得到了错误,我认为您不应该使用LIMIT。好的,batch.max.rows会处理好的,我知道…但是连接器没有限制条款就失败了!!我不认为限制会有任何影响…尝试调试。。。