mysql中的k近邻

mysql中的k近邻,mysql,stored-procedures,nearest-neighbor,Mysql,Stored Procedures,Nearest Neighbor,我在MySQL中有下表: DATE EDGE VALUE D E1 X1 D E2 Y1 D E3 Z1 D1 E1 X2 D1 E2 Y2 D1 E3 Z2 D2 E1 X3 D2 E2 Y3 D2 E3 Z3 现在我想计算D到D1和D到D2的欧几里德距离 距离D1=SqrtX1-X

我在MySQL中有下表:

DATE   EDGE   VALUE
D      E1       X1
D      E2       Y1
D      E3       Z1


D1      E1       X2
D1      E2       Y2
D1      E3       Z2


D2      E1       X3
D2     E2       Y3
D2      E3       Z3
现在我想计算D到D1和D到D2的欧几里德距离 距离D1=SqrtX1-X2^2+Y1-Y2^2+Z1-Z2^2; 距离D2=SqrtX1-X3^2+Y1-Y3^2+Z1-Z3^2; ....... 等等

从这个距离中,我想选择D的“k”近邻。 请注意,记录D可能没有edgesE1、E2…En的任何条目。在这种情况下,其他D1、D2、D3将具有相同数量的边条目

请建议我将此解决方案作为MySQL中的存储过程

提前谢谢

@eggyal:

我试图构建类似于您所回答的查询

查询:

SELECT   b.id,SQRT(SUM(POW(a.score - b.score, 2))) score1
FROM     (select * from data d1 where  d1.id = (select max(t1.id) from Timestamp t1) 
and d1.edge_id in (select m1.src_edge from mapping m1
where m1.dst = (select m2.src from mapping m2 where m2.src_edge=2 limit 1))) a
JOIN (select * from data d2 where d2.id in ( select t2.id from Timestamp t2 where DAYOFWEEK(NOW())=DAYOFWEEK(t2.timestamp)) and d2.edge_id in (select m3.src_edge from mapping m3 
where m3.dst = (select m4.src from mapping m4 
where m4.src_edge=2 limit 1))) as b
ON b.id <> a.id AND b.edge_id = a.edge_id 
GROUP BY b.id
ORDER BY score1
LIMIT    5;
映射表

时间戳表:


也许这可以让你开始

CREATE TABLE nodes
(node_id CHAR(2) NOT NULL,   plane CHAR(1) NOT NULL, value INT NOT NULL, PRIMARY KEY(node_id,plane));

INSERT INTO nodes VALUES
('D','x',5),
('D','y',10),
('D','z',15),
('D1','x',20),
('D1','y',25),
('D1','z',30);

CREATE VIEW v_nodes AS 
SELECT node_id
    , MAX(CASE WHEN plane = 'x' THEN value END) x
    , MAX(CASE WHEN plane = 'y' THEN value END) y
    , MAX(CASE WHEN plane = 'z' THEN value END) z
 FROM nodes 
GROUP
   BY node_id;

SELECT ROUND(SQRT
                ( POW(ABS(d.x - d1.x),2) 
                + POW(ABS(d.y - d1.y),2)
                + POW(ABS(d.z - d1.z),2)
                )
           ,2)delta
  FROM v_nodes d
  JOIN v_nodes d1 
 WHERE d.node_id = 'd'
   AND d1.node_id = 'd1';

+-------+
| delta |
+-------+
| 25.98 |
+-------+   
如中所述,为什么不简单地执行自连接并按计算的距离排序

SELECT   b.date
FROM     my_table a
    JOIN my_table b ON b.date <> a.date AND b.edge = a.edge
WHERE    a.date = ?
GROUP BY b.date
ORDER BY SUM(POW(a.value - b.value, 2))
LIMIT    ?

请参见。

这是最终解决方案:

SELECT   d2.id, SQRT(SUM(POW(d1.score - d2.score, 2))) score1
    FROM     data d1
        JOIN data d2 ON d2.id <> d1.id AND d2.edge_id = d1.edge_id
        JOIN mapping m1 ON m1.src_edge = d1.edge_id
        JOIN mapping m2 ON m2.src = m1.dst
        JOIN (SELECT MAX(t1.id) as id  FROM Timestamp) t1 ON t1.id = d1.id
        JOIN Timestamp t2 ON t2.id = d2.id
    WHERE    m2.src_edge = 2
         AND DAYOFWEEK(NOW()) = DAYOFWEEK(t2.timestamp)
    GROUP BY d2.id
    ORDER BY score1
    LIMIT    5;
此解决方案由eggyal给出

我完全同意你的观点,我应该学习加入。我提出的任何解决方案都是我最好的。我一定会尽快学会加入。
感谢您的大力帮助……

为什么需要存储过程?为什么不简单地执行自连接并按计算出的距离排序?@eggyal:因为在执行此查询之前,我必须进行一些预处理。你也可以推荐一个非存储过程的解决方案。我只是建议了一个非存储过程的解决方案。@ EGGYAL:我想要精确的SQL查询来解决上述问题。考虑提供适当的DDL和/或SQLFIDLE连同期望的结果设置。请参阅下面的答案并建议所需的。modifications@eggyal: 谢谢你的解决方案。我将尽可能早地学习英语。。
CREATE TABLE nodes
(node_id CHAR(2) NOT NULL,   plane CHAR(1) NOT NULL, value INT NOT NULL, PRIMARY KEY(node_id,plane));

INSERT INTO nodes VALUES
('D','x',5),
('D','y',10),
('D','z',15),
('D1','x',20),
('D1','y',25),
('D1','z',30);

CREATE VIEW v_nodes AS 
SELECT node_id
    , MAX(CASE WHEN plane = 'x' THEN value END) x
    , MAX(CASE WHEN plane = 'y' THEN value END) y
    , MAX(CASE WHEN plane = 'z' THEN value END) z
 FROM nodes 
GROUP
   BY node_id;

SELECT ROUND(SQRT
                ( POW(ABS(d.x - d1.x),2) 
                + POW(ABS(d.y - d1.y),2)
                + POW(ABS(d.z - d1.z),2)
                )
           ,2)delta
  FROM v_nodes d
  JOIN v_nodes d1 
 WHERE d.node_id = 'd'
   AND d1.node_id = 'd1';

+-------+
| delta |
+-------+
| 25.98 |
+-------+   
SELECT   b.date
FROM     my_table a
    JOIN my_table b ON b.date <> a.date AND b.edge = a.edge
WHERE    a.date = ?
GROUP BY b.date
ORDER BY SUM(POW(a.value - b.value, 2))
LIMIT    ?
SELECT   d2.id, SQRT(SUM(POW(d1.score - d2.score, 2))) score1
    FROM     data d1
        JOIN data d2 ON d2.id <> d1.id AND d2.edge_id = d1.edge_id
        JOIN mapping m1 ON m1.src_edge = d1.edge_id
        JOIN mapping m2 ON m2.src = m1.dst
        JOIN (SELECT MAX(t1.id) as id  FROM Timestamp) t1 ON t1.id = d1.id
        JOIN Timestamp t2 ON t2.id = d2.id
    WHERE    m2.src_edge = 2
         AND DAYOFWEEK(NOW()) = DAYOFWEEK(t2.timestamp)
    GROUP BY d2.id
    ORDER BY score1
    LIMIT    5;