Snowflake cloud data platform 使用“复制并选择”将拼花地板数据加载到表中时出现数据类型错误

Snowflake cloud data platform 使用“复制并选择”将拼花地板数据加载到表中时出现数据类型错误,snowflake-cloud-data-platform,parquet,Snowflake Cloud Data Platform,Parquet,我正在尝试将拼花地板数据从AWS S3阶段移动到Snowflake中的表中,并不断获得数据类型错误。具体来说,无论我如何调整我的列,这个错误都会不断出现 无法识别数值“” 挑战在于,我在90多个列中移动,当我遇到错误时,雪花不会告诉我哪一列或哪一行被破坏,因此很难找到并修复。我也不能使用validate函数来解决这个问题,因为validate和validate_PIPE_LOAD不支持COPY with transform。当我在select语句中隔离每个单独的列时,所有输出都很好——所以我甚至

我正在尝试将拼花地板数据从AWS S3阶段移动到Snowflake中的表中,并不断获得数据类型错误。具体来说,无论我如何调整我的列,这个错误都会不断出现

无法识别数值“”

挑战在于,我在90多个列中移动,当我遇到错误时,雪花不会告诉我哪一列或哪一行被破坏,因此很难找到并修复。我也不能使用validate函数来解决这个问题,因为validate和validate_PIPE_LOAD不支持COPY with transform。当我在select语句中隔离每个单独的列时,所有输出都很好——所以我甚至找不到问题单元格本身

有没有办法让Snowflake忽略错误,只是将其移入,或者将列全部重铸为其他内容,而不识别干草堆单元中的指针?我尝试过但没有解决的一些事情包括:

  • 完全删除copy语句中的强制转换::并尝试将所有内容作为变量进行移动-这使我将未能将变量值“”强制转换为固定的错误
  • 将所有内容更改为VARCHAR
  • 将文件格式更改为csv
  • 在TRY_TO_NUMBER/DECIMAL()中包装数字和小数
  • 我也尝试了这里列出的所有建议,但都没有用-
我的代码如下:


# create the file format for parquet files

CREATE FILE FORMAT MYPARQUETFORMAT
  TYPE = PARQUET
  COMPRESSION = snappy;

# create my table 

create or replace table mytable (
 ADM VARCHAR,
ADMDATER DATE default null,
admit_dateR DATE default null,
ADMTYPE NUMBER(38,0) default null,
BEGDATER VARCHAR,
BILL DECIMAL default null,
CLMCNT1 VARCHAR,
CLMCNT2 VARCHAR,
CLMCNT3 VARCHAR,
DAYS NUMBER(38,0) default null,
DISPSTAT VARCHAR,
DRG VARCHAR,
DX1 VARCHAR,
DX2 VARCHAR,
DX3 VARCHAR,
DX4 VARCHAR,
DX5 VARCHAR,
DX6 VARCHAR,
DX7 VARCHAR,
DX8 VARCHAR,
DX9 VARCHAR,
DX10 VARCHAR,
DX11 VARCHAR,
DX12 VARCHAR,
DX13 VARCHAR,
DX14 VARCHAR,
DX15 VARCHAR,
DX16 VARCHAR,
DX17 VARCHAR,
DX18 VARCHAR,
DX19 VARCHAR,
DX20 VARCHAR,
DX21 VARCHAR,
DX22 VARCHAR,
DX23 VARCHAR,
DX24 VARCHAR,
DX25 VARCHAR,
INSTTYPE VARCHAR,
Intake_BENCAT VARCHAR,
MDC NUMBER(38,0) default null,
PAID DECIMAL,
POA2 VARCHAR,
POA3 VARCHAR,
POA4 VARCHAR,
POA5 VARCHAR,
POA6 VARCHAR,
POA7 VARCHAR,
POA8 VARCHAR,
POA9 VARCHAR,
POA10 VARCHAR,
POA11 VARCHAR,
POA12 VARCHAR,
POA13 VARCHAR,
POA14 VARCHAR,
POA15 VARCHAR,
POA16 VARCHAR,
POA17 VARCHAR,
POA18 VARCHAR,
POA19 VARCHAR,
POA20 VARCHAR,
POA21 VARCHAR,
POA22 VARCHAR,
POA23 VARCHAR,
POA24 VARCHAR,
POA25 VARCHAR,
RandomID INT,
RWP NUMBER(38,0) default null,
TOTDAYS NUMBER(38,0) default null,
PROC2 VARCHAR,
PROC3 VARCHAR,
PROC4 VARCHAR,
PROC5 VARCHAR,
PROC6 VARCHAR,
PROC7 VARCHAR,
PROC8 VARCHAR,
PROC9 VARCHAR,
PROC10 VARCHAR,
PROC11 VARCHAR,
PROC12 VARCHAR,
PROC13 VARCHAR,
PROC14 VARCHAR,
PROC15 VARCHAR,
PROC16 VARCHAR,
PROC17 VARCHAR,
PROC18 VARCHAR,
PROC19 VARCHAR,
PROC20 VARCHAR,
PROC21 VARCHAR,
PROC22 VARCHAR,
PROC23 VARCHAR,
PROC24 VARCHAR,
PROC25 VARCHAR
);

# Copy data using copy and select statements

COPY INTO TEDI
FROM(SELECT
$1:ADM::VARCHAR,
$1:ADMDATER::DATE,
$1:admit_dateR::DATE,
$1:ADMTYPE::NUMBER(38,0),
$1:BEGDATER::VARCHAR,
$1:BILL::DECIMAL,
$1:CLMCNT1::VARCHAR,
$1:CLMCNT2::VARCHAR,
$1:CLMCNT3::VARCHAR,
$1:DAYS::NUMBER(38,0),
$1:DISPSTAT::VARCHAR,
$1:DRG::VARCHAR,
$1:DX1::VARCHAR,
$1:DX2::VARCHAR,
$1:DX3::VARCHAR,
$1:DX4::VARCHAR,
$1:DX5::VARCHAR,
$1:DX6::VARCHAR,
$1:DX7::VARCHAR,
$1:DX8::VARCHAR,
$1:DX9::VARCHAR,
$1:DX10::VARCHAR,
$1:DX11::VARCHAR,
$1:DX12::VARCHAR,
$1:DX13::VARCHAR,
$1:DX14::VARCHAR,
$1:DX15::VARCHAR,
$1:DX16::VARCHAR,
$1:DX17::VARCHAR,
$1:DX18::VARCHAR,
$1:DX19::VARCHAR,
$1:DX20::VARCHAR,
$1:DX21::VARCHAR,
$1:DX22::VARCHAR,
$1:DX23::VARCHAR,
$1:DX24::VARCHAR,
$1:DX25::VARCHAR,
$1:INSTTYPE::VARCHAR,
$1:Intake_BENCAT::VARCHAR,
$1:MDC::NUMBER(38,0),
$1:PAID::DECIMAL,
$1:POA10::VARCHAR,
$1:POA11::VARCHAR,
$1:POA12::VARCHAR,
$1:POA13::VARCHAR,
$1:POA14::VARCHAR,
$1:POA15::VARCHAR,
$1:POA16::VARCHAR,
$1:POA17::VARCHAR,
$1:POA18::VARCHAR,
$1:POA19::VARCHAR,
$1:POA2::VARCHAR,
$1:POA20::VARCHAR,
$1:POA21::VARCHAR,
$1:POA22::VARCHAR,
$1:POA23::VARCHAR,
$1:POA24::VARCHAR,
$1:POA25::VARCHAR,
$1:POA3::DATE,
$1:POA4::DATE,
$1:POA5::DATE,
$1:POA6::DATE,
$1:POA7::DATE,
$1:POA8::DATE,
$1:POA9::DATE,
$1:RandomID::INT,
$1:RWP::NUMBER(38,0),
$1:TOTDAYS::NUMBER(38,0),
$1:PROC2::VARCHAR,
$1:PROC3::VARCHAR,
$1:PROC4::VARCHAR,
$1:PROC5::VARCHAR,
$1:PROC6::VARCHAR,
$1:PROC7::VARCHAR,
$1:PROC8::VARCHAR,
$1:PROC9::VARCHAR,
$1:PROC10::VARCHAR,
$1:PROC11::VARCHAR,
$1:PROC12::VARCHAR,
$1:PROC13::VARCHAR,
$1:PROC14::VARCHAR,
$1:PROC15::VARCHAR,
$1:PROC16::VARCHAR,
$1:PROC17::VARCHAR,
$1:PROC18::VARCHAR,
$1:PROC19::VARCHAR,
$1:PROC20::VARCHAR,
$1:PROC21::VARCHAR,
$1:PROC22::VARCHAR,
$1:PROC23::VARCHAR,
$1:PROC24::VARCHAR,
$1:PROC25::VARCHAR
FROM @s3_stage/tedi.parquet
(file_format => MYPARQUETFORMAT));

试试下面的语法。我首先将拼花数据转换为varchar,然后将try_应用到try_编号

select metadata$filename as file_name
      ,$1:date_column::VARCHAR as date_column --format 20210203 
      ,$1:address1::VARCHAR as address1 --alpha numberic
      ,TRY_TO_NUMBER($1:address1::VARCHAR) as address2 --output null for alpha numeric rows. Produce result only for numeric rows. It should convert '' to NULL.
      ,TO_NUMBER($1:date_column::VARCHAR) as add_date2 
from @public.stage_name/directory_path/file_name.parquet (file_format => public.parque_format) t;

您是否尝试过创建包含所有VARCHAR字段的表,然后将数据加载到这些字段中?这将至少表明这是否是一个铸造问题。如果成功,您可以使用显式强制转换查询雪花内部的数据,以查看您的问题所在。@Rika这个问题解决了吗?是的!通过创建包含所有VARCHAR的表,然后根据需要加载和转换,我能够将所有数据移入。仍然很难找到抛出错误的确切行,但现在它的方法更容易了。谢谢大家!