Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java DBSCAN及其索引是否应该具有相同的距离函数_Java_Scala_Data Mining_Dbscan_Elki - Fatal编程技术网

Java DBSCAN及其索引是否应该具有相同的距离函数

Java DBSCAN及其索引是否应该具有相同的距离函数,java,scala,data-mining,dbscan,elki,Java,Scala,Data Mining,Dbscan,Elki,是否要求DBSCAN及其索引具有相同的距离函数?如果不是,需要使用不同距离函数的情况是什么 Scala代码如何创建DBSCAN和索引: import de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN import de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.parallel.ParallelGeneralizedDBSCAN import de.lmu.ifi.dbs.elki.data.mode

是否要求DBSCAN及其索引具有相同的距离函数?如果不是,需要使用不同距离函数的情况是什么

Scala代码如何创建DBSCAN和索引:

import de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN
import de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.parallel.ParallelGeneralizedDBSCAN
import de.lmu.ifi.dbs.elki.data.model.Model
import de.lmu.ifi.dbs.elki.data.{Clustering, DoubleVector, NumberVector}
import de.lmu.ifi.dbs.elki.database.{Database, StaticArrayDatabase}
import de.lmu.ifi.dbs.elki.datasource.ArrayAdapterDatabaseConnection
import de.lmu.ifi.dbs.elki.distance.distancefunction.NumberVectorDistanceFunction
import de.lmu.ifi.dbs.elki.distance.distancefunction.minkowski.SquaredEuclideanDistanceFunction
import de.lmu.ifi.dbs.elki.index.tree.metrical.covertree.SimplifiedCoverTree

def createDatabase(data: Array[Array[Double]], distanceFunction: NumberVectorDistanceFunction[NumberVector]): Database = {
  val indexFactory = new SimplifiedCoverTree.Factory[NumberVector](distanceFunction, 1.3, 20)
  // Create a database
  val db = new StaticArrayDatabase(new ArrayAdapterDatabaseConnection(data), java.util.Arrays.asList(indexFactory))
  // Load the data into the database
  db.initialize()
  db
}

def dbscanClustering(data: Array[Array[Double]], distanceFunction: NumberVectorDistanceFunction[NumberVector]): Unit = {
  // Use the same `distanceFunction` for the database and DBSCAN <- is it required??
  val db = createDatabase(data, distanceFunction)
  val dbscan = new DBSCAN[DoubleVector](distanceFunction, 0.01, 20)
  val result: Clustering[Model] = dbscan.run(db)
  println(s"Number of clusters: ${result.getAllClusters.size()}")
  result.getAllClusters.asScala.zipWithIndex.foreach { case (cluster, idx) =>
    println(s"# $idx: ${cluster.getNameAutomatic}")
    println(s"Size: ${cluster.size()}")
    println(s"Model: ${cluster.getModel}")
}
val inputData: Array[Array[Double]] = ???
dbscanClustering(inputData, SquaredEuclideanDistanceFunction)
import de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN
导入de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.parallel.parallel.ddbscan
导入de.lmu.ifi.dbs.elki.data.model.model
导入de.lmu.ifi.dbs.elki.data.{聚类,双向量,NumberVector}
导入de.lmu.ifi.dbs.elki.database.{database,StaticArrayDatabase}
导入de.lmu.ifi.dbs.elki.datasource.ArrayAdapterDatabaseConnection
导入de.lmu.ifi.dbs.elki.distance.distancefunction.NumberVectorDistanceFunction
导入de.lmu.ifi.dbs.elki.distance.distance function.minkowski.squareducliedIndianceFunction
导入de.lmu.ifi.dbs.elki.index.tree.metrical.covertree.SimplifiedCoverTree
def createDatabase(数据:数组[Array[Double]],距离函数:NumberVectorDistanceFunction[NumberVector]):数据库={
val indexFactory=new SimplifiedCoverTree.Factory[NumberVector](距离函数,1.3,20)
//创建数据库
val db=新的StaticArrayDatabase(新的ArrayAdapterDatabaseConnection(数据),java.util.Arrays.asList(indexFactory))
//将数据加载到数据库中
db.initialize()
分贝
}
def dbscanClustering(数据:数组[Array[Double]],距离函数:NumberVectorDistanceFunction[NumberVector]):单位={
//对数据库和DBSCAN使用相同的“distanceFunction”
println(s“#$idx:${cluster.getNameAutomatic}”)
println(s“大小:${cluster.Size()}”)
println(s“Model:${cluster.getModel}”)
}
val输入数据:数组[数组[双精度]]=???
dbscanClustering(输入数据,SquaredUclideAndInstanceFunction)

如果索引使用相同的距离函数,则该索引只能用于加速。 一些索引可以支持多个(但不是任意)距离,例如R*-树可以支持所有空间距离函数(尽管成功率各不相同)

显然,如果你建立一个索引来加速余弦距离,但是你要求欧几里得最近邻,那么这个索引不能也不会被使用

您不需要使用索引,但如果没有运行时,则将是O(n²);使用索引可以更快(取决于参数、维度等-在最坏的情况下,索引是开销)