Docker Kafka Connect:多个DB2 JDBC源连接器失败
我试图在本地Docker容器中使用Kafka Connect(使用官方的Confluent映像),以便将DB2数据推送到Openshift(AWS)上的Kafka集群。我将合流JDBC连接器与DB2JDBCJAR一起使用。 我有不同的连接器配置,因为我将SMT与“transforms.createKey”(创建我的密钥)一起使用,并且我的表中的密钥列具有不同的名称 以下是我的步骤:Docker Kafka Connect:多个DB2 JDBC源连接器失败,docker,jdbc,apache-kafka,apache-kafka-connect,confluent-platform,Docker,Jdbc,Apache Kafka,Apache Kafka Connect,Confluent Platform,我试图在本地Docker容器中使用Kafka Connect(使用官方的Confluent映像),以便将DB2数据推送到Openshift(AWS)上的Kafka集群。我将合流JDBC连接器与DB2JDBCJAR一起使用。 我有不同的连接器配置,因为我将SMT与“transforms.createKey”(创建我的密钥)一起使用,并且我的表中的密钥列具有不同的名称 以下是我的步骤: 为Kafka Connect创建配置、偏移和状态的主题 启动/创建Kafka Connect容器(带环境变量,请
- 为Kafka Connect创建配置、偏移和状态的主题
- 启动/创建Kafka Connect容器(带环境变量,请参见下文)
- 通过对我的Connect容器的post调用创建第一个JDBC连接器(配置见下文)
第一次尝试:
[2018-12-17 13:09:15,683] ERROR Invalid call to OffsetStorageWriter flush() while already flushing, the framework should not allow this (org.apache.kafka.connect.storage.OffsetStorageWriter)
[2018-12-17 13:09:15,684] ERROR WorkerSourceTask{id=db2-jdbc-source-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: OffsetStorageWriter is already flushing
at org.apache.kafka.connect.storage.OffsetStorageWriter.beginFlush(OffsetStorageWriter.java:110)
at org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:409)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:238)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2018-12-17 13:09:15,686] ERROR WorkerSourceTask{id=db2-jdbc-source-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
[2018-12-17 13:09:15,686] INFO [Producer clientId=producer-4] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2018-12-17 13:09:20,682] ERROR Graceful stop of task db2-jdbc-source-0 failed. (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 13:09:20,682] INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
第二次尝试:
[2018-12-17 14:01:31,658] INFO Stopping task db2-jdbc-source-0 (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 14:01:31,689] INFO Stopped connector db2-jdbc-source (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 14:01:31,784] INFO WorkerSourceTask{id=db2-jdbc-source-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-12-17 14:01:31,784] INFO WorkerSourceTask{id=db2-jdbc-source-0} flushing 20450 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-12-17 14:01:36,733] ERROR Graceful stop of task db2-jdbc-source-0 failed. (org.apache.kafka.connect.runtime.Worker)
[2018-12-17 14:01:36,733] INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
Kafka Connect Docker环境变量:
-e CONNECT_BOOTSTRAP_SERVERS=my_kafka_cluster:443 \
-e CONNECT_PRODUCER_BOOTSTRAP_SERVERS="my_kafka_cluster:443" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="kafka-connect" \
-e CONNECT_REST_PORT=8083 \
-e CONNECT_GROUP_ID="kafka-connect-group" \
-e CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=3 \
-e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config" \
-e CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=3 \
-e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset" \
-e CONNECT_OFFSET_FLUSH_INTERVAL_MS=15000 \
-e CONNECT_OFFSET_FLUSH_TIMEOUT_MS=60000 \
-e CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=3 \
-e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status" \
-e CONNECT_KEY_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL=http://url_to_schemaregistry \
-e CONNECT_VALUE_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL=http://url_to_schemaregistry \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_KEY_CONVERTER_SCHEMAS_ENABLE="false" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER_SCHEMAS_ENABLE="false" \
-e CONNECT_PLUGIN_PATH=/usr/share/java \
-e CONNECT_PRODUCER_BUFFER_MEMORY="8388608" \
-e CONNECT_SECURITY_PROTOCOL="SSL" \
-e CONNECT_PRODUCER_SECURITY_PROTOCOL="SSL" \
-e CONNECT_SSL_TRUSTSTORE_LOCATION="/usr/share/kafka.client.truststore.jks" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_LOCATION="/usr/share/kafka.client.truststore.jks" \
-e CONNECT_SSL_TRUSTSTORE_PASSWORD="my_ts_pw" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_PASSWORD="my_ts_pw" \
-e CONNECT_LOG4J_LOGGERS=org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR \
-e CONNECT_LOG4J_ROOT_LOGLEVEL=INFO \
-e HOSTNAME=kafka-connect \
JDBC连接器(只有表和键列不同):
我最终解决了问题: 我在时间戳模式下使用JDBC连接器,而不是时间戳+递增,因为我不能(总是)指定递增列。我知道这可能会导致问题,当有多个具有相同时间戳的条目时,Connect无法知道哪些条目已经被读取 我的大部分数据行具有相同的时间戳。当我添加第二个连接器时,存储了第一个连接器的当前时间戳,Connect开始重新平衡,因此失去了已经读取了该stimestamp的哪些行的信息。当连接器启动并再次运行时,第一个连接器继续“下一个时间戳”,因此只加载最新的行(这只是一小部分)
我的错误是假设,在这种情况下,第一个连接器将使用前一个时间戳重新开始工作,而不是继续使用“下一个时间戳”。对我来说,宁可冒重复的风险,也不要冒可能丢失数据的风险,这会更有意义。当你发布一个新配置时,它会重新平衡connect集群,但它不应该“停止”任何配置workers@cricket_007是的,我就是这么想的。我确实得到了“Rebelance started”的日志条目,之后是第一个连接器已停止的条目。查看日志,第一个连接器似乎也再次启动,但似乎“相信”没有更多的工作要做。对于其他有相同问题的人:我通过先启动所有kafka connect contaners,然后应用配置来解决这个问题。这样,在重新平衡时不会提交偏移量,因此不会丢失数据:)
-e CONNECT_BOOTSTRAP_SERVERS=my_kafka_cluster:443 \
-e CONNECT_PRODUCER_BOOTSTRAP_SERVERS="my_kafka_cluster:443" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="kafka-connect" \
-e CONNECT_REST_PORT=8083 \
-e CONNECT_GROUP_ID="kafka-connect-group" \
-e CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=3 \
-e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config" \
-e CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=3 \
-e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset" \
-e CONNECT_OFFSET_FLUSH_INTERVAL_MS=15000 \
-e CONNECT_OFFSET_FLUSH_TIMEOUT_MS=60000 \
-e CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=3 \
-e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status" \
-e CONNECT_KEY_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL=http://url_to_schemaregistry \
-e CONNECT_VALUE_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL=http://url_to_schemaregistry \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_KEY_CONVERTER_SCHEMAS_ENABLE="false" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER_SCHEMAS_ENABLE="false" \
-e CONNECT_PLUGIN_PATH=/usr/share/java \
-e CONNECT_PRODUCER_BUFFER_MEMORY="8388608" \
-e CONNECT_SECURITY_PROTOCOL="SSL" \
-e CONNECT_PRODUCER_SECURITY_PROTOCOL="SSL" \
-e CONNECT_SSL_TRUSTSTORE_LOCATION="/usr/share/kafka.client.truststore.jks" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_LOCATION="/usr/share/kafka.client.truststore.jks" \
-e CONNECT_SSL_TRUSTSTORE_PASSWORD="my_ts_pw" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_PASSWORD="my_ts_pw" \
-e CONNECT_LOG4J_LOGGERS=org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR \
-e CONNECT_LOG4J_ROOT_LOGLEVEL=INFO \
-e HOSTNAME=kafka-connect \
{
"name": "db2-jdbc-source",
"config":
{
"mode":"timestamp",
"debug":"true",
"batch.max.rows":"50",
"poll.interval.ms":"10000",
"timestamp.delay.interval.ms":"60000",
"timestamp.column.name":"IBMSNAP_LOGMARKER",
"connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector" ,
"connection.url":"jdbc:db2://myip:myport/mydb:currentSchema=myschema;",
"connection.password":"mypw",
"connection.user":"myuser",
"connection.backoff.ms":"60000",
"dialect.name": "Db2DatabaseDialect",
"table.types": "TABLE",
"table.poll.interval.ms":"60000",
"table.whitelist":"MYTABLE1",
"tasks.max":"1",
"topic.prefix":"db2_",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url":"http://url_to_schemaregistry",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://url_to_schemaregistry",
"transforms":"createKey",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"MYKEY1"
}
}