SQL:如何查询一组数据并计算给定字符串列表中匹配的字符串数

SQL:如何查询一组数据并计算给定字符串列表中匹配的字符串数,sql,postgresql,Sql,Postgresql,我建立了一个简单的问答系统 在我的数据库中,有三个表: question ( id int question varchar(200) answer_id int /* foreign key mapping to answer.id */ ); answer ( id int answer varchar(500) ) question_elements ( id int seq int /*vocabular

我建立了一个简单的问答系统

在我的数据库中,有三个表:

question (
  id         int
  question varchar(200)
  answer_id  int  /* foreign key mapping to answer.id */
);

answer (
  id  int
  answer    varchar(500)
)

question_elements (
    id    int
    seq   int    /*vocabulary in question location */
    question_id    int  /** foreign key mapping to question.id */
    vocabulary  varchar(40)
)
现在我有一个问题:

What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ?
所以,在表问题中,记录是:

question {
  id: 1,
  question:"What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ?",
  answer_id:1
}
表中的问题_元素

question_elements [
  {
    id: 1,
    seq: 1,
    question_id: 1,
    vocabulary: "what"
  },
  {
    id: 2,
    seq: 2,
    question_id: 1,
    vocabulary: "approach"
  },
  {
    id: 3,
    seq: 3,
    question_id: 1,
    vocabulary: "should"
  },
  {
    id: 4,
    seq: 4,
    question_id: 1,
    vocabulary: "a"
  },
  {
    id: 5,
    seq: 5,
    question_id: 1,
    vocabulary: "company"
  },
  {
    id: 6,
    seq: 6,
    question_id: 1,
    vocabulary: "adopt"
  },
  {
    id: 7,
    seq: 7,
    question_id: 1,
    vocabulary: "when"
  },
  ....
  ....
  {
    id: 19,
    seq: 19,
    question_id: 1,
    vocabulary: "get"
  },
  {
    id: 20,
    seq: 20,
    question_id: 1,
    vocabulary: "funding"
  }
]
现在,当用户输入:

What action does a company should do when it wanna get more funding with high debt ratio
我的想法是将上述语句拆分为一个字符串列表,并执行一个SQL查询,以便通过给出上述字符串列表来计算表question_元素中匹配的字符串


PostgreSQL中的SQL语句是什么?

如果我理解得很好,您需要这样的语句:

WITH answer AS (
    SELECT 
        'What action does a company should do when it wanna get more funding' AS a
),
question AS (
    SELECT 'what' AS q
    UNION ALL SELECT 'should'
    UNION ALL SELECT 'a'
    UNION ALL SELECT 'company'
    UNION ALL SELECT 'do'
    UNION ALL SELECT 'when'
)
SELECT COUNT(result)
FROM (
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result
    FROM answer
) AS tbaux
WHERE result IN (select CAST(q AS VARCHAR) FROM question);
没有文字大写,以及一些解释:

SELECT COUNT(result)
FROM (                                                 --count how many lines have in the subquery
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result        --this break the user input in one word per line, excluding ' '
    FROM answer
) AS tbaux                                                                  --name of the subquery
WHERE upper(result) IN (select upper(CAST(q AS VARCHAR)) FROM question);    --upper turns lowercase letters in uppercase, only the line who match will remain to the COUNT()
这将统计用户输入的单词在问题表中的数量(在您的案例中为
question\u elements


如果我理解得很好,你想要这样的东西:

WITH answer AS (
    SELECT 
        'What action does a company should do when it wanna get more funding' AS a
),
question AS (
    SELECT 'what' AS q
    UNION ALL SELECT 'should'
    UNION ALL SELECT 'a'
    UNION ALL SELECT 'company'
    UNION ALL SELECT 'do'
    UNION ALL SELECT 'when'
)
SELECT COUNT(result)
FROM (
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result
    FROM answer
) AS tbaux
WHERE result IN (select CAST(q AS VARCHAR) FROM question);
没有文字大写,以及一些解释:

SELECT COUNT(result)
FROM (                                                 --count how many lines have in the subquery
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result        --this break the user input in one word per line, excluding ' '
    FROM answer
) AS tbaux                                                                  --name of the subquery
WHERE upper(result) IN (select upper(CAST(q AS VARCHAR)) FROM question);    --upper turns lowercase letters in uppercase, only the line who match will remain to the COUNT()
这将统计用户输入的单词在问题表中的数量(在您的案例中为
question\u elements


不需要
问题元素表

with ui(ui) as (
    values ('What action does a company should do when it wanna get more funding with high debt ratio')
)
select id, count(*) as matches, question
from
    (
        select id, question, regexp_split_to_table(question, '\s+') as word
        from question
    ) q
    inner join
    regexp_split_to_table((select ui from ui), '\s+') ui(word) using (word)
group by 1, 3
order by matches desc

question\u elements
表格不是必需的

with ui(ui) as (
    values ('What action does a company should do when it wanna get more funding with high debt ratio')
)
select id, count(*) as matches, question
from
    (
        select id, question, regexp_split_to_table(question, '\s+') as word
        from question
    ) q
    inner join
    regexp_split_to_table((select ui from ui), '\s+') ui(word) using (word)
group by 1, 3
order by matches desc

您使用的是json字段还是这是您向我们显示数据的方式?看起来您有两个问题。一个是按
执行拆分,另一个是查看有多少匹配项。您要查找的输出列可能重复?如果认为我理解您的意思,请在可能的情况下尝试我的答案。您是使用json字段还是以这种方式向我们显示数据?看起来您有两个问题。一个是按
执行拆分,另一个是查看有多少匹配项。您要查找的输出列可能重复?如果认为我理解您的意思,请尽可能尝试我的答案。