Google bigquery 使用bigquery将unicode解码为希伯来语_Google Bigquery

Google bigquery 使用bigquery将unicode解码为希伯来语

google-bigquery

Google bigquery 使用bigquery将unicode解码为希伯来语,google-bigquery,Google Bigquery,使用Google bigquery standardSQL，我想将以下unicode字符串解码为希伯来语： אלדןתחבורהמסכמתאתהרבעוןהראשוןלשנ

使用Google bigquery standardSQL，我想将以下unicode字符串解码为希伯来语：

אלדןתחבורהמסכמתאתהרבעוןהראשוןלשנת；2021年עםהמשךשיפורמשמעותיברווחיותהפעילות
注意,；用于将单个希伯来文字符与下一个字符分开。空格将一个单词与下一个单词隔开。还有一个数字（2021）应该按原样插入解码结果。最终结果（希伯来语）应为：
在2014年8月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月月日日日日日日日日日日日日日日日日日日日日日日日日日日日日日日日日日日日六、六、九、九、九、九、八、九
多谢各位
 考虑以下解决方案
create temp function decode(word string) as ((
  select if(starts_with(word, '&#x'), 
    safe.code_points_to_string(array(
      select cast(value as int64)
      from unnest(split(replace(word, '&#', '0'),';')) value
      where not value = ''
    )), 
    word)
));
select (
    select string_agg(decode(word), ' ')
    from unnest(split(txt, ' ')) word
  ) as Hebrew_txt
from `project.dataset.table`   

是否适用于您问题中的样本数据
with `project.dataset.table` as (
  select '&#x5D0;&#x5DC;&#x5D3;&#x5DF; &#x5EA;&#x5D7;&#x5D1;&#x5D5;&#x5E8;&#x5D4; &#x5DE;&#x5E1;&#x5DB;&#x5DE;&#x5EA; &#x5D0;&#x5EA; &#x5D4;&#x5E8;&#x5D1;&#x5E2;&#x5D5;&#x5DF; &#x5D4;&#x5E8;&#x5D0;&#x5E9;&#x5D5;&#x5DF; &#x5DC;&#x5E9;&#x5E0;&#x5EA; 2021 &#x5E2;&#x5DD; &#x5D4;&#x5DE;&#x5E9;&#x5DA; &#x5E9;&#x5D9;&#x5E4;&#x5D5;&#x5E8; &#x5DE;&#x5E9;&#x5DE;&#x5E2;&#x5D5;&#x5EA;&#x5D9; &#x5D1;&#x5E8;&#x5D5;&#x5D5;&#x5D7;&#x5D9;&#x5D5;&#x5EA; &#x5D4;&#x5E4;&#x5E2;&#x5D9;&#x5DC;&#x5D5;&#x5EA;' txt
)         

输出为

下面是调整后的函数，用于处理带有保留/特殊字符的情况，例如，#$@*（）{}、&^%！？。等等
create temp function decode(word string) as ((
  select if(starts_with(word, '&#x'), 
    safe.code_points_to_string(array(
      select ifnull(safe_cast(value as int64), ascii(value))
      from unnest(split(replace(word, '&#', '0'),';')) value
      where not value = ''
    )), 
    word)
));

在我的原始示例Mikhail上运行得很好！当我对包含保留/特殊字符（如，#$@*（）{}、&^%！？）的类似输入运行代码时，它会失败！？。然后我得到一个错误，比如坏的int64值：，你能修改你的代码让它在这样一个输入上工作吗：שבחיםלסוקולובסקיהאריס；：ידענושתהיהמלחמה；