PostgreSQL中文本的n-grams_Sql_Postgresql_Text Mining

PostgreSQL中文本的n-grams

sql postgresql

PostgreSQL中文本的n-grams,sql,postgresql,text-mining,Sql,Postgresql,Text Mining,我希望在PostgreSQL中从文本列创建n-grams。我目前将文本列中的数据（句子）拆分为一个数组在此处输入代码从tableName中选择regexp\u split\u to\u数组（sentenceData，E'\s+'）一旦我有了这个阵列，我该如何进行：创建一个循环来查找n个g，并将每个g写入另一个表中的一行使用unnest，我可以在单独的行上获得所有数组的所有元素，也许我可以想出一种从单个列中获得n-gram的方法，但是我会放宽我明智地保留的句子边界 PostgreSQL

我希望在PostgreSQL中从文本列创建n-grams。我目前将文本列中的数据（句子）拆分为一个数组

在此处输入代码

从tableName中选择regexp\u split\u to\u数组（sentenceData，E'\s+'）

一旦我有了这个阵列，我该如何进行：

创建一个循环来查找n个g，并将每个g写入另一个表中的一行

使用unnest，我可以在单独的行上获得所有数组的所有元素，也许我可以想出一种从单个列中获得n-gram的方法，但是我会放宽我明智地保留的句子边界

PostgreSQL模拟上述场景的示例SQL代码

create table tableName(sentenceData  text);

INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');

INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');

INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');

select regexp_split_to_array(sentenceData,E'\\s+')   from tableName;

select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;

检查：“pg_trgm模块提供基于三元匹配确定文本相似性的函数和运算符，以及支持快速搜索相似字符串的索引运算符类。”