迭代SQLite查询的结果作为后续查询的输入_Sql_Sqlite_Loops_Count_Subquery

迭代SQLite查询的结果作为后续查询的输入

sql sqlite loops

迭代SQLite查询的结果作为后续查询的输入,sql,sqlite,loops,count,subquery,Sql,Sqlite,Loops,Count,Subquery,我有一个SQLite表，其中包含以下字段，表示从存储在磁盘上的各个文件中提取的元数据。每个文件都有一个记录： __path denotes the full path and filename (in effect the PK) __dirpath denotes the directory path excluding the filename __dirname denotes the directory name in which the file is found r

我有一个SQLite表，其中包含以下字段，表示从存储在磁盘上的各个文件中提取的元数据。每个文件都有一个记录：

__path      denotes the full path and filename (in effect the PK)
__dirpath   denotes the directory path excluding the filename
__dirname   denotes the directory name in which the file is found
refid       denotes an attribute of interest, pulled from the underlying file on disk

文件在创建时按_dirname进行分组和存储

所有文件都在 __dirname应该具有相同的refid，但refid有时不存在

作为一个起点，我想确定每一条路径有不一致文件的

我对识别有问题的文件夹的查询如下：

SELECT __dirpath
  FROM (
           SELECT DISTINCT __dirpath,
                           __dirname,
                           refid
             FROM source
       )
 GROUP BY __dirpath
HAVING count( * ) > 1
 ORDER BY __dirpath, __dirname;

是否可以迭代查询的结果，并将每个结果用作另一个查询的输入，而不必在SQLite旁边使用类似Python的东西？例如，要查看属于失败集的记录，请执行以下操作：

SELECT __dirpath, refid
  FROM source
 WHERE __dirpath = <nth result from aforementioned query>;

SELECT\uuuu dirpath，refid
来源
其中_dirpath=；

如果需要所有有问题的行，一个选项是：

select t.*
from (
    select t.*,
        min(refid) over(partition by __dirpath, __dirname) as min_refid,
        max(refid) over(partition by __dirpath, __dirname) as max_refid
    from mytable t
) t
where min_refid <> max_refid

谢谢@GMB。我尝试了两种方法，第一种方法执行速度相对较快，但忽略refid为null的记录。第二次运行，但是它对大约600k记录的运行速度非常慢-在uuu dirpath、uu dirname、refid上建立索引解决了这个问题。真正的表有更多的字段，所以我将t.*替换为我要查找的特定字段。由于refid不一致可能由null或alternate值引起，因此第二个查询生成了正确的结果，列出了所有受影响的记录。

select t.*
from mytable t
where exists (
    select 1
    from mytable t1
    where 
        t1.__dirpath = t.__dirpath 
        and t1.__dirname = t.__dirname
        and t1.ref_id is not t.ref_id
)