Regex postgresql中的干净字符串_Regex_Postgresql_Text_Split_Substring

Regex postgresql中的干净字符串

regex postgresql text

Regex postgresql中的干净字符串,regex,postgresql,text,split,substring,Regex,Postgresql,Text,Split,Substring,我在表格中有一列，其中包含与公司变更相关的任何更新的数据，格式如下- #=============#==============#================# | Company ID | updated_at | updates | #=============#==============#================# | 101 | 2020-11-01 | name: | | |

我在表格中有一列，其中包含与公司变更相关的任何更新的数据，格式如下-

#=============#==============#================#
| Company ID  |  updated_at  |   updates      |
#=============#==============#================#
| 101         | 2020-11-01   | name:          |
|             |              | -ABC           |
|             |              | -XYZ           |
|             |              | url:           |
|             |              | -www.abc.com   |
|             |              | -www.xyz.com   |
+-------------+--------------+----------------+
| 109         | 2020-10-20   | rating:        |
|             |              | -4.5           |
|             |              | -4.0           |
+-------------+--------------+----------------+

如上所述，

updates

列包含包含换行符并描述一个或多个更新的字符串。在上面的示例中，这意味着对于公司ID 101，名称从ABC更改为XYZ，url从更改为。对于ID为109的公司，只有评级从4.5更改为4.0

但是，我想将更新列分为3列-一列应该包含更改的内容（url、名称等），第二列应该有旧值，第三列应该有新值。像这样的-

#============#============#==============#================#
| Company ID |   Field    |  Old Value   |   New Value    |
#============#============#==============#================#
| 101        |   name     | ABC          | XYZ            |
+------------+------------+--------------+----------------+
| 101        |   url      | www.abc.com  | www.xyz.com    |
+------------+------------+--------------+----------------+
| 109        |   rating   | 4.5          | 4.0            |
+------------+------------+--------------+----------------+

我在Postgres中这样做，并且知道如何根据字符提取子字符串，但这对我来说有点复杂，因为我需要为每行从同一列中提取多个子字符串。任何帮助都将不胜感激。谢谢

首先，您可以使用

regexp\u split\u into\u table

和具有正向前瞻性的regexp来获取表的版本，其中每一行正好包含一个更新：

select companyID, 
       updated_at, 
       regexp_split_to_table(updates, '\n(?=\y.+:)') as updates 
  from old;

这将在任何换行符（

\n

）处拆分列

updates

，该换行符后跟一个单词和一个冒号（

\y.+：

）

由此，您可以更轻松地构建所需的表。为此，您可以使用例如

split_part

将更新字符串拆分为所需的三个部分

将这一点与第一部分结合起来，可以得到完整的查询：

select companyID, 
       updated_at, 
       split_part(updates, E':', 1) as field, 
       split_part(updates, E'\n-', 2) as old_value, 
       split_part(updates, E'\n-', 3) as new_value  
  from (select companyID, 
               updated_at, 
               regexp_split_to_table(updates, '\n(?=\y.+:)') as updates 
          from old
       )
;

下面是一个dbfiddle示例：

更多详细信息/附加信息：

postgres字符串中的换行符：
postgresql正则表达式单词边界：
将字符串拆分为新列：

这是一个带有换行符的单列值，还是这几个表项？请更详细地描述那张表。@LaurenzAlbe你说得对。这是一行，其中包含特定公司在特定日期发生的所有更新。我已经更新了OP，以提供有关当前表的更多上下文。您可以指定表的DDL吗？尤其是更新专栏很有趣。

select companyID, 
       updated_at, 
       split_part(updates, E':', 1) as field, 
       split_part(updates, E'\n-', 2) as old_value, 
       split_part(updates, E'\n-', 3) as new_value  
  from (select companyID, 
               updated_at, 
               regexp_split_to_table(updates, '\n(?=\y.+:)') as updates 
          from old
       )
;