Snowflake cloud data platform Snowflake改进搜索加入网络IP

Snowflake cloud data platform Snowflake改进搜索加入网络IP,snowflake-cloud-data-platform,Snowflake Cloud Data Platform,我试图知道ip列表的国家名称,问题是评估加入的时间非常长。 这就是想法: WITH IP_GEO_CITY AS (SELECT AS_INTEGER(PARSE_IP(A.NETWORK, 'INET'):ipv4_range_start) AS IP_START, AS_INTEGER(PARSE_IP(A.NETWORK, 'INET'):ipv4_range_end) AS IP_END, B.COUNTRY_NAME AS COUNTRY_NAME FROM

我试图知道ip列表的国家名称,问题是评估加入的时间非常长。 这就是想法:

WITH IP_GEO_CITY AS
(SELECT
    AS_INTEGER(PARSE_IP(A.NETWORK, 'INET'):ipv4_range_start) AS IP_START,
    AS_INTEGER(PARSE_IP(A.NETWORK, 'INET'):ipv4_range_end) AS IP_END,
    B.COUNTRY_NAME AS COUNTRY_NAME
 FROM
    GEOLITE_CITY_BLOCK_IPV4 AS A
 LEFT JOIN 
    GEOLITE_LOCATIONS AS B
    ON
        A.GEONAME_ID = B.GEONAME_ID
 ORDER BY
    IP_START ASC
)
SELECT 
    UNIQUE_IP_NUMBER,
    COUNTRY_NAME
 FROM 
    UNIQUE_IP_NUMBER AS A
 LEFT JOIN
    IP_GEO_CITY AS B
    ON
        A.UNIQUE_IP_NUMBER >= B.IP_START AND 
        A.UNIQUE_IP_NUMBER <= B.IP_END
我认为问题出在条件下:

ON
        A.UNIQUE_IP_NUMBER >= B.IP_START AND 
        A.UNIQUE_IP_NUMBER <= B.IP_END

UNIQUE_IP_NUMBER是普通IP到整数的转换。

雪花在像您这样的范围扫描中往往表现不好

ON
    A.UNIQUE_IP_NUMBER >= B.IP_START AND 
    A.UNIQUE_IP_NUMBER <= B.IP_END

我们甚至发现
>=&使用内部联接的查询性能如何?由于您的请求是针对COUNTRY_NAME的值,因此您应该允许Snowflake通过使用内部联接进行更好的修剪,并且只获取已存在的值。这可能会有帮助。不查看数据、查询配置文件和仓库大小。很难确定性能的根本原因。如果将表展平为每个IP 1行,那么它就是一个相等联接,会怎么样?Snowflake可能使用delta编码,这对于排序的连续32位整数非常有效,对于城市可能使用字典压缩,因此它可能只是几个微分区。看看这个问题,它非常类似。这里的答案显示了如何将IP块扩展为单个IP:
NETWORK GEONAME_ID  REGISTERED_COUNTRY_GEONAME_ID   REPRESENTED_COUNTRY_GEONAME_ID  IS_ANONYMOUS_PROXY  IS_SATELLITE_PROVIDER   ETL_ID  ETL_TIMESTAMP   FILENAME
1.0.0.0/24  2077456 2077456     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.1.0/24  1814991 1814991     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.2.0/23  1814991 1814991     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.4.0/22  2077456 2077456     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.8.0/21  1814991 1814991     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.16.0/20 1861060 1861060     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.32.0/19 1814991 1814991     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.64.0/18 1861060 1861060     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.0.128.0/17    1605651 1605651     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
1.1.0.0/24  1814991 1814991     0   0   2019-10-25 00:00:00.000000000   2019-10-25 08:39:19.000000000   GeoLite2-Country-CSV_20191022/GeoLite2-Country-Blocks-IPv4.csv
ON
    A.UNIQUE_IP_NUMBER >= B.IP_START AND 
    A.UNIQUE_IP_NUMBER <= B.IP_END
ON A.UNIQUE_IP_NUMBER BETWEEN B.IP_START AND B.IP_END
ON A.UNIQUE_IP_NUMBER BETWEEN B.IP_START AND B.IP_END 
   AND B.IP_START < B.IP_END
WITH ip_geo_city AS (
    SELECT
        PARSE_IP(a.network, 'INET') as ipv4_range_end  
        AS_INTEGER(ip:ipv4_range_end) AS ip_end,
        AS_INTEGER(ip:netmask_prefix_length) AS ip_netlenmask
        BITOR(BITSHIFTLEFT(ip_netlenmask, 32), ip_end) as lookup_key
        b.country_name AS country_name
    FROM geolite_city_block_ipv4 AS a
    LEFT JOIN geolite_locations AS b
        ON a.geoname_id = b.geoname_id
    ORDER BY lookup_key
), ipv4_masks AS (
    SELECT ROW_NUMBER() OVER(ORDER BY TRUE) as rn
        ,32-rn as net_len
        --,BITSHIFTLEFT(1, rn) as b
        --,b-1 as bm
        --,BITNOT(bm, 4294967295) as bn
        --,-b as bnn -- due to two's complement -b = BITNOT(b-1)
        ,BITAND(4294967295, -BITSHIFTLEFT(1, rn)) as net_mask 
    FROM table(generator(rowcount => 31)) ;
), unique_ip_number_lookups AS (
    SELECT a.unique_ip_number
        ,BITOR(BITSHIFTLEFT(m.net_len, 32), BITAND(m.net_mask, a.unique_ip_number) as lookup_key
    FROM unique_ip_number AS a
    JOIN ipv4_masks as m
)
SELECT 
    a.unique_ip_number,
    b.country_name
FROM 
    unique_ip_number_lookups AS a
LEFT JOIN ip_geo_city AS b
    ON a.lookup_key = b.lookup_key