Neo4j k-means聚类中的Jaccard 在CyfER中,你如何修改k-均值来考虑JACARD距离DJ而不是欧几里得距离?

Neo4j k-means聚类中的Jaccard 在CyfER中,你如何修改k-均值来考虑JACARD距离DJ而不是欧几里得距离?,neo4j,cypher,Neo4j,Cypher,其中,Jaccard距离定义为Dj=1-(| A∩B |)/(A∪B |)下面是一个如何使用Cypher计算Jaccard距离的示例(从: MATCH(m:Movie{title:“Inception”})-[:IN_general]->(g:genera)(mg:genera) 以m,other,intersection,i,COLLECT(mg.name)作为s1 匹配(其他)-[:IN_体裁]->(og:体裁) 将m、other、intersection、i、s1、COLLECT(og.n


其中,Jaccard距离定义为Dj=1-(| A∩B |)/(A∪B |)

下面是一个如何使用Cypher计算Jaccard距离的示例(从:

MATCH(m:Movie{title:“Inception”})-[:IN_general]->(g:genera)(mg:genera)
以m,other,intersection,i,COLLECT(mg.name)作为s1
匹配(其他)-[:IN_体裁]->(og:体裁)
将m、other、intersection、i、s1、COLLECT(og.name)作为s2
与m、其他、交叉点、s1、s2
以m,other,intersection,s1+过滤器(s2中的x,s1中的不是x)作为并集,s1,s2
按jaccard DESC LIMIT 100返回m.title、其他.title、s1、s2、((1.0*交点)/大小(并集))作为jaccard订单

一旦你计算出来,你就可以把它和你的k-均值算法一起使用。你是如何运行k-means的?还有Cypher?

查看此图表,非常感谢!是的,这最终将是一个neo4j查询。
MATCH (m:Movie {title: "Inception"})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(other:Movie)
WITH m, other, COUNT(g) AS intersection, COLLECT(g.name) AS i
MATCH (m)-[:IN_GENRE]->(mg:Genre)
WITH m,other, intersection,i, COLLECT(mg.name) AS s1
MATCH (other)-[:IN_GENRE]->(og:Genre)
WITH m,other,intersection,i, s1, COLLECT(og.name) AS s2
WITH m,other,intersection,s1,s2
WITH m,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1, s2
RETURN m.title, other.title, s1,s2,((1.0*intersection)/SIZE(union)) AS jaccard ORDER BY jaccard DESC LIMIT 100