Google bigquery 将数据流式传输到Google BigQuery表:SocketTimeoutException、502坏网关、500内部服务器错误警告

Google bigquery 将数据流式传输到Google BigQuery表:SocketTimeoutException、502坏网关、500内部服务器错误警告,google-bigquery,apache-camel,spring-jms,spring-camel,Google Bigquery,Apache Camel,Spring Jms,Spring Camel,我们正在使用Camel BigQueryAPI(版本2.20)将记录从ActiveMQ服务器(版本5.14.3)上的消息队列流式传输到Google BigQuery表中 我们在Spring框架中实现并部署了流机制作为XML路由定义,因此: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http:/

我们正在使用Camel BigQueryAPI(版本2.20)将记录从ActiveMQ服务器(版本5.14.3)上的消息队列流式传输到Google BigQuery表中

我们在Spring框架中实现并部署了流机制作为XML路由定义,因此:

<?xml version="1.0" encoding="UTF-8"?>
<beans
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.springframework.org/schema/beans"
    xmlns:beans="http://www.springframework.org/schema/beans"
    xsi:schemaLocation="
        http://www.springframework.org/schema/beans 
        ./spring-beans.xsd
        http://camel.apache.org/schema/spring
        ./camel-spring.xsd">

    <!--
    # ==========================================================================
    # ActiveMQ JMS Bean Definition
    # ==========================================================================
    -->
    <bean id="jms" class="org.apache.camel.component.jms.JmsComponent">
        <property name="connectionFactory">
            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
                <property name="brokerURL" value="nio://192.168.10.10:61616?jms.useAsyncSend=true" />
                <property name="userName"  value="MyAmqUserName" />
                <property name="password"  value="MyAmqPassword" />
            </bean>
        </property>
    </bean>

    <!--
    # ==========================================================================
    # GoogleBigQueryComponent
    # https://github.com/apache/camel/tree/master/components/camel-google-bigquery
    # ==========================================================================
    -->
    <bean id="gcp" class="org.apache.camel.component.google.bigquery.GoogleBigQueryComponent">
        <property name="connectionFactory">
            <bean class="org.apache.camel.component.google.bigquery.GoogleBigQueryConnectionFactory">
                <property name="credentialsFileLocation" value="MyDir/MyGcpKeyFile.json" />
            </bean>
        </property>
    </bean>

    <!--
    # ==========================================================================
    # Main Context Bean Definition
    # ==========================================================================
    -->
    <camelContext id="camelContext" xmlns="http://camel.apache.org/schema/spring" >

        <!--
        ========================================================================
        https://camel.apache.org/maven/current/camel-core/apidocs/org/apache/camel/processor/RedeliveryPolicy.html
        ========================================================================
        -->
        <onException useOriginalMessage="true">
            <exception>com.google.api.client.googleapis.json.GoogleJsonResponseException</exception>
            <exception>java.net.SocketTimeoutException</exception>
            <exception>java.net.ConnectException</exception>
            <redeliveryPolicy
                backOffMultiplier="2"
                logHandled="false"
                logRetryAttempted="true"
                maximumRedeliveries="10"
                maximumRedeliveryDelay="60000"
                redeliveryDelay="1000"
                retriesExhaustedLogLevel ="ERROR"
                retryAttemptedLogLevel="WARN"
                />
        </onException>

        <!--
        # ==================================================================
        # Message Route :
        # 1. consume messages from my AMQ queue
        # 2. write message to Google BigQuery table
        # see https://github.com/apache/camel/blob/master/components/camel-google-bigquery/src/main/docs/google-bigquery-component.adoc
        # ==================================================================
        -->
        <route>
            <from uri="jms:my.amq.queue.of.output.data.for.gcp?acknowledgementModeName=DUPS_OK_ACKNOWLEDGE&amp;concurrentConsumers=20" />
            <to uri="gcp:my_gcp_project:my_bq_data_set:my_bq_table" />
        </route>

    </camelContext>

</beans>
问题

  • 我使用的onException对象一般/语法正确吗(除非对redeliveryPolicy属性进行微调)?还是我遗漏了什么

  • 我感兴趣的第一条警告消息是,“交付尝试:0捕获:java.net.SocketTimeoutException”。我的日志文件没有“发送时尝试:1”、“发送时尝试:2”等。这是否意味着给定邮件的后续发送尝试成功

  • 就试图将数据流传输到GCP而言,我是否应该区别对待“SocketTimeoutException”、“500内部服务器错误”和“502坏网关”,还是使用相同的OneException+重新交付策略

  • 是否有其他方法可以提高将数据流式传输到GCP的Camel/Google API方法的性能?Camel/Google API是否可以支持消息批处理以减少GCP插入操作的数量?我已经在使用双流重复数据消除(CAMELGOLBIGQUERYISERTID)


  • 免责声明:我没有使用Camel BigQueryAPI的经验。我的回答是基于对BigQueryAPI的观察和理解

  • 根据观察,如果没有出现错误日志,则可能意味着重试成功
  • 超时重试/500/502可以是相同的。至少我不知道如何区别对待它们
  • 配料肯定会有所帮助,基于:
  • 每个请求的最大行数:每个请求10000行

    建议最多500行。批处理可能会增加 性能和吞吐量达到一定程度,但以每个请求为代价 延迟。每个请求的行太少,每个请求的开销可能会增加 使摄取效率低下。每个请求的行太多,并且 吞吐量可能会下降

    建议每个请求最多500行,但对代表性数据(架构和数据大小)的实验将帮助您确定理想的批处理大小

    2019-10-21 15:33:13 | WARN  | DefaultErrorHandler | Failed delivery for (MessageId: XXX on ExchangeId: XXX). On delivery attempt: 0 caught: java.net.SocketTimeoutException: connect timed out
    
    2019-10-24 12:46:53 | WARN  | DefaultErrorHandler | Failed delivery for (MessageId: XXX on ExchangeId: XXX). On delivery attempt: 0 caught: com.google.api.client.googleapis.json.GoogleJsonResponseException: 502 Bad Gateway
    
    2019-10-25 12:33:33 | WARN  | DefaultErrorHandler | Failed delivery for (MessageId: XXX on ExchangeId: XXX). On delivery attempt: 0 caught: com.google.api.client.googleapis.json.GoogleJsonResponseException: 500 Internal Server Error