String 在PostgreSQL中,根据字符串中的特定条件计算值?

String 在PostgreSQL中,根据字符串中的特定条件计算值?,string,postgresql,String,Postgresql,我的表中有一列条件,每行包含类似的文本:- Inclusion Criteria: - Female - > 40 years of age - Women who have first-degree relative suffered from breast cancer - Women who have first-degree relative suffered from ovarian cancer - Family history of male breast

我的表中有一列条件,每行包含类似的文本:-

Inclusion Criteria:

-  Female

-  > 40 years of age

-  Women who have first-degree relative suffered from breast cancer

-  Women who have first-degree relative suffered from ovarian cancer

-  Family history of male breast cancer

-  Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.

-  Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members

-  Personal history of ovarian cancer

-  Personal history of premalignant conditions of breast and ovary

Exclusion Criteria:

     - Women with mammogram within one year
     -  adults aged 50-75

我需要找出PostgreSQL中包含和排除标准的计数。例如,此处包含标准为9,排除标准为2。

您是说上述所有内容都出现在一列中吗

如果是这样,您可以使用正则表达式模式匹配,从字符串“Inclusion Criteria:”到字符串“Exclution Criteria:”进行搜索,并计算其间的行数

Regex能让你头脑清醒。

您可以使用PL/pgSQL创建一个存储过程来进行解析和分离。一旦获得了它,就可以通过
SELECT
调用字符串或单元格,就像调用任何其他PostgreSQL函数一样

如果要在一个操作中同时返回两个值(包含和排除),最简单的方法是创建一个表,定义它们的名称和类型,如下所示:

CREATE TABLE condition_counts (
  num_of_inclusions VARCHAR,
  num_of_exclusions VARCHAR
);
然后,您可以在存储过程定义中使用它,如下所示:

CREATE OR REPLACE FUNCTION parse_conditions(conditions VARCHAR) RETURNS condition_counts AS $$
DECLARE
    condition_matches VARCHAR[2];
    inclusion_count INTEGER;
    exclusion_count INTEGER;
    parsed_conditions condition_counts%ROWTYPE;
BEGIN
    condition_matches = regexp_matches(conditions,
        E'^Inclusion Criteria:\\s*(.*)\\s*Exclusion Criteria:\\s*(.*)$');
    SELECT array_length(regexp_split_to_array(condition_matches[1], E'\\n\\s*-\\s*'), 1),
           array_length(regexp_split_to_array(condition_matches[2], E'\\n\\s*-\\s*'), 1)
      INTO parsed_conditions.num_of_inclusions, parsed_conditions.num_of_exclusions;
    return parsed_conditions;
END
$$ LANGUAGE plpgsql;
现在,您可以在提供的示例字符串上调用它,如下所示:

SELECT * FROM parse_conditions('Inclusion Criteria:

-  Female

-  > 40 years of age

-  Women who have first-degree relative suffered from breast cancer

-  Women who have first-degree relative suffered from ovarian cancer

-  Family history of male breast cancer

-  Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.

-  Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members

-  Personal history of ovarian cancer

-  Personal history of premalignant conditions of breast and ovary

Exclusion Criteria:

     - Women with mammogram within one year
     -  adults aged 50-75');

并将按预期返回9和2的计数。您还可以从tablename执行
选择parse_条件(columnname)和其他各种组合,这对于PostgreSQL函数来说是正常的。

所以这实际上是一个文本处理/模式匹配/解析问题,而不是数据库本身。在您的示例中,整个文本都在一行中?或者不同的行在这里表示不同的行?@oto:不同的行表示不同的行。.我使用了代码数组长度(字符串到数组(子字符串(较低的(标准)来自“包含(+)排除”),“-”),1)-1作为cnt,我们有更好的解决方案吗?@user322101-好的,你还有其他列吗(例如,
id
timestamp
或类似的东西)确定这些行的顺序?请参见上面的@Feneric注释。