Postgresql postgres：全文搜索：查找重复文本行的最快方法？_Postgresql

Postgresql postgres：全文搜索：查找重复文本行的最快方法？

postgresql

Postgresql postgres：全文搜索：查找重复文本行的最快方法？,postgresql,Postgresql,：-）查找表中重复文本的最快方法是什么，即在一列中有文本的表中的行在整个表中至少出现两次？该表包含超过1.6亿行我有一个由以下列组成的表：id，maintext，和maintext\u标记，后者是使用to\u tsvector（maintext）创建的。此外，我在maintext\u令牌上创建了一个GIN索引，即使用GIN（maintext\u令牌）在tablename上创建索引idx\u maintext\u令牌目前，我正在使用以下方法，但这需要相当长的时间：我有一个由以下列组成的表：

：-）

查找表中重复文本的最快方法是什么，即在一列中有文本的表中的行在整个表中至少出现两次？该表包含超过1.6亿行

我有一个由以下列组成的表：

id

，

maintext

，和

maintext\u标记

，后者是使用

to\u tsvector（maintext）创建的。此外，我在maintext\u令牌上创建了一个GIN索引，即使用GIN（maintext\u令牌）在tablename上创建索引idx\u maintext\u令牌
目前，我正在使用以下方法，但这需要相当长的时间：
我有一个由以下列组成的表：id
，maintext
，和maintext\u标记
，后者是使用to\u tsvector（maintext）创建的。此外，我在maintext\u令牌上创建了一个GIN索引，即使用GIN（maintext\u令牌）在tablename上创建索引idx\u maintext\u令牌
我也尝试执行相同的操作，但我没有使用maintext
而是使用maintext\u标记
列进行比较：
select maintext_token, count(maintext_token)
from ccnc
group by maintext_token
having count(maintext_token)>1
order by maintext_token;

这两个查询似乎都运行得很长，尽管我希望至少第二个查询要快得多，因为postgres可以使用索引进行比较
提前感谢您的任何见解！
干杯：）
你说你想测试平等性，所以你可能想对文本进行散列，然后在散列上搜索。您可以使用散列索引来实现这一点，也可以对文本的散列进行索引。我最近在一个相关问题上得到了一些帮助，您可以在这里找到详细信息和比较：

select maintext_token, count(maintext_token)
from ccnc
group by maintext_token
having count(maintext_token)>1
order by maintext_token;