Sql BigQuery解析URL网址_Sql_Url_Google Bigquery

Sql BigQuery解析URL网址

sql url google-bigquery

Sql BigQuery解析URL网址,sql,url,google-bigquery,Sql,Url,Google Bigquery,我需要帮助使用BigQuery解析web URL。需要删除最后一个正斜杠“/”后的字符串/文本并返回URL。输入URL长度可以随记录而变化。如果输入URL在域地址后没有和字符串/文本，则应按原样返回URL 这里有一些例子输入Web URL 预期产出我尝试过使用SPLIT函数将URL字符串转换为数组，并使用ARRAY_LENGTH计算数组大小。然而，它并没有涵盖我上面提到的所有各种场景请建议如何解决这个问题？在BigQuery中使用标准SQL？我认为case表达式有助于填补空

我需要帮助使用BigQuery解析web URL。需要删除最后一个正斜杠“/”后的字符串/文本并返回URL。输入URL长度可以随记录而变化。如果输入URL在域地址后没有和字符串/文本，则应按原样返回URL

这里有一些例子

输入Web URL

预期产出

我尝试过使用SPLIT函数将URL字符串转换为数组，并使用ARRAY_LENGTH计算数组大小。然而，它并没有涵盖我上面提到的所有各种场景

请建议如何解决这个问题？在BigQuery中使用标准SQL？

我认为

case

表达式有助于填补空白：

select (case when url like '%//%/%' then regexp_replace(url, '/[^/]+$', '')
             else url
        end)
from (select 'https://www.stackoverflow.com/questions/ask' as url union all
      select 'https://www.stackoverflow.com/questions' as url union all
      select 'https://www.stackoverflow.com' as url
      ) x;

我认为一个

case

表达式有助于填补空白：

select (case when url like '%//%/%' then regexp_replace(url, '/[^/]+$', '')
             else url
        end)
from (select 'https://www.stackoverflow.com/questions/ask' as url union all
      select 'https://www.stackoverflow.com/questions' as url union all
      select 'https://www.stackoverflow.com' as url
      ) x;

下面是BigQuery标准SQL

#standardSQL
SELECT url, 
  REPLACE(REGEXP_REPLACE(REPLACE(url, '//', '\\'), r'/[^/]+$', ''), '\\', '//')
FROM `project.dataset.table`

您可以使用问题中的示例数据测试、播放上述内容，如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'https://www.stackoverflow.com' url UNION ALL
  SELECT 'https://www.stackoverflow.com/questions' UNION ALL
  SELECT 'https://www.stackoverflow.com/questions/ask' UNION ALL
  SELECT 'https://stackoverflow.com/questions/ask/some-text' 
)
SELECT url, 
  REPLACE(REGEXP_REPLACE(REPLACE(url, '//', '\\'), r'/[^/]+$', ''), '\\', '//') value
FROM `project.dataset.table`

结果

Row url                                                 value    
1   https://www.stackoverflow.com                       https://www.stackoverflow.com    
2   https://www.stackoverflow.com/questions             https://www.stackoverflow.com    
3   https://www.stackoverflow.com/questions/ask         https://www.stackoverflow.com/questions  
4   https://stackoverflow.com/questions/ask/some-text   https://stackoverflow.com/questions/ask

下面是BigQuery标准SQL

#standardSQL
SELECT url, 
  REPLACE(REGEXP_REPLACE(REPLACE(url, '//', '\\'), r'/[^/]+$', ''), '\\', '//')
FROM `project.dataset.table`

您可以使用问题中的示例数据测试、播放上述内容，如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'https://www.stackoverflow.com' url UNION ALL
  SELECT 'https://www.stackoverflow.com/questions' UNION ALL
  SELECT 'https://www.stackoverflow.com/questions/ask' UNION ALL
  SELECT 'https://stackoverflow.com/questions/ask/some-text' 
)
SELECT url, 
  REPLACE(REGEXP_REPLACE(REPLACE(url, '//', '\\'), r'/[^/]+$', ''), '\\', '//') value
FROM `project.dataset.table`

结果

Row url                                                 value    
1   https://www.stackoverflow.com                       https://www.stackoverflow.com    
2   https://www.stackoverflow.com/questions             https://www.stackoverflow.com    
3   https://www.stackoverflow.com/questions/ask         https://www.stackoverflow.com/questions  
4   https://stackoverflow.com/questions/ask/some-text   https://stackoverflow.com/questions/ask

您可以对最后一个“/”及其后的字符串使用simpleREGEXP\u REPLACE

SELECT REGEXP_REPLACE(url, r"([^/])/[^/]*$", "\\1")
FROM (SELECT 'https://www.stackoverflow.com/questions/ask' as url UNION ALL
  SELECT 'https://www.stackoverflow.com/questions' as url UNION ALL
  SELECT 'https://www.stackoverflow.com' as url
)

<强>注释< /强>：1（第一捕获组）代表“//”之前的字符，我们需要考虑字符以避免与“//”匹配。

测试结果：

您可以对最后一个“/”及其后的字符串使用simpleREGEXP\u REPLACE

SELECT REGEXP_REPLACE(url, r"([^/])/[^/]*$", "\\1")
FROM (SELECT 'https://www.stackoverflow.com/questions/ask' as url UNION ALL
  SELECT 'https://www.stackoverflow.com/questions' as url UNION ALL
  SELECT 'https://www.stackoverflow.com' as url
)

<强>注释< /强>：1（第一捕获组）代表“//”之前的字符，我们需要考虑字符以避免与“//”匹配。

测试结果：

提供JavaScript UDF解决方案。这并不是因为它更适合这种情况，而是当事情变得非常复杂时，它总是你最后的希望

（另外，我想指出的是，url中可能存在双斜杠，如：，要处理这个问题，可能需要用JavaScript编写额外的逻辑代码）

提供JavaScript UDF解决方案。这并不是因为它更适合这种情况，而是当事情变得非常复杂时，它总是你最后的希望

（另外，我想指出的是，url中可能存在双斜杠，如：，要处理这个问题，可能需要用JavaScript编写额外的逻辑代码）

好极了！我知道有更好的办法，但没有成功。好极了！忘记提及-而不是

“\\1”

-您可以使用

r“\1”

@MikhailBerlyant-谢谢您的帮助@KSAIKH——当然，也要考虑一下有帮助的答案：O）布拉沃！我知道有更好的办法，但没有成功。好极了！忘记提及-而不是

“\\1”

-您可以使用

r“\1”

@MikhailBerlyant-谢谢您的帮助@KHAIKH——当然，也要考虑有帮助的答案：O）