String 在PostgreSQL中,根据字符串中的特定条件计算值?
我的表中有一列条件,每行包含类似的文本:-String 在PostgreSQL中,根据字符串中的特定条件计算值?,string,postgresql,String,Postgresql,我的表中有一列条件,每行包含类似的文本:- Inclusion Criteria: - Female - > 40 years of age - Women who have first-degree relative suffered from breast cancer - Women who have first-degree relative suffered from ovarian cancer - Family history of male breast
Inclusion Criteria:
- Female
- > 40 years of age
- Women who have first-degree relative suffered from breast cancer
- Women who have first-degree relative suffered from ovarian cancer
- Family history of male breast cancer
- Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.
- Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members
- Personal history of ovarian cancer
- Personal history of premalignant conditions of breast and ovary
Exclusion Criteria:
- Women with mammogram within one year
- adults aged 50-75
我需要找出PostgreSQL中包含和排除标准的计数。例如,此处包含标准为9,排除标准为2。您是说上述所有内容都出现在一列中吗 如果是这样,您可以使用正则表达式模式匹配,从字符串“Inclusion Criteria:”到字符串“Exclution Criteria:”进行搜索,并计算其间的行数 Regex能让你头脑清醒。
您可以使用PL/pgSQL创建一个存储过程来进行解析和分离。一旦获得了它,就可以通过
SELECT
调用字符串或单元格,就像调用任何其他PostgreSQL函数一样
如果要在一个操作中同时返回两个值(包含和排除),最简单的方法是创建一个表,定义它们的名称和类型,如下所示:
CREATE TABLE condition_counts (
num_of_inclusions VARCHAR,
num_of_exclusions VARCHAR
);
然后,您可以在存储过程定义中使用它,如下所示:
CREATE OR REPLACE FUNCTION parse_conditions(conditions VARCHAR) RETURNS condition_counts AS $$
DECLARE
condition_matches VARCHAR[2];
inclusion_count INTEGER;
exclusion_count INTEGER;
parsed_conditions condition_counts%ROWTYPE;
BEGIN
condition_matches = regexp_matches(conditions,
E'^Inclusion Criteria:\\s*(.*)\\s*Exclusion Criteria:\\s*(.*)$');
SELECT array_length(regexp_split_to_array(condition_matches[1], E'\\n\\s*-\\s*'), 1),
array_length(regexp_split_to_array(condition_matches[2], E'\\n\\s*-\\s*'), 1)
INTO parsed_conditions.num_of_inclusions, parsed_conditions.num_of_exclusions;
return parsed_conditions;
END
$$ LANGUAGE plpgsql;
现在,您可以在提供的示例字符串上调用它,如下所示:
SELECT * FROM parse_conditions('Inclusion Criteria:
- Female
- > 40 years of age
- Women who have first-degree relative suffered from breast cancer
- Women who have first-degree relative suffered from ovarian cancer
- Family history of male breast cancer
- Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.
- Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members
- Personal history of ovarian cancer
- Personal history of premalignant conditions of breast and ovary
Exclusion Criteria:
- Women with mammogram within one year
- adults aged 50-75');
并将按预期返回9和2的计数。您还可以从tablename执行
选择parse_条件(columnname)代码>和其他各种组合,这对于PostgreSQL函数来说是正常的。所以这实际上是一个文本处理/模式匹配/解析问题,而不是数据库本身。在您的示例中,整个文本都在一行中?或者不同的行在这里表示不同的行?@oto:不同的行表示不同的行。.我使用了代码数组长度(字符串到数组(子字符串(较低的(标准)来自“包含(+)排除”),“-”),1)-1作为cnt,我们有更好的解决方案吗?@user322101-好的,你还有其他列吗(例如,id
或timestamp
或类似的东西)确定这些行的顺序?请参见上面的@Feneric注释。