Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 杜克重复数据消除引擎:can';我找不到确切的记录_Java_Duplicates_Record Linkage_Duke - Fatal编程技术网

Java 杜克重复数据消除引擎:can';我找不到确切的记录

Java 杜克重复数据消除引擎:can';我找不到确切的记录,java,duplicates,record-linkage,duke,Java,Duplicates,Record Linkage,Duke,我正在尝试为Duke创建一个配置和处理器,以便在记录列表中找到精确的匹配项。我创建了一个基于ExactMatchComparator的处理器,但该函数不返回精确匹配。 以下是处理器、配置和侦听器的设置: public void setup() { //setup List<Property> exactMatchProperties = new ArrayList<Property>(); exactListener = new TestUtils.Test

我正在尝试为Duke创建一个配置和处理器,以便在记录列表中找到精确的匹配项。我创建了一个基于ExactMatchComparator的处理器,但该函数不返回精确匹配。 以下是处理器、配置和侦听器的设置:

public void setup() {
  //setup
  List<Property> exactMatchProperties = new ArrayList<Property>();

  exactListener = new TestUtils.TestListener();

  ExactComparator exactComparator = new ExactComparator();

  //create properties (columns), with name, comparator, and high-low thresholds. ID property has no comparator or propabilities.

  exactMatchProperties.add(new PropertyImpl("ID"));
  exactMatchProperties.add(new PropertyImpl("NAME", exactComparator, 0.0, 1.0));
  exactMatchProperties.add(new PropertyImpl("EMAIL", exactComparator, 0.0, 1.0));

  //create new configuration implementation
  exactMatchConfig = new ConfigurationImpl();

  //add properties to config
  exactMatchConfig.setProperties(exactMatchProperties);
  exactMatchConfig.setThreshold(1.0);
  exactMatchConfig.setMaybeThreshold(0.0);

  //initialize the processor and add match listener
  exactMatchProcessor = new Processor(exactMatchConfig, true);
  exactMatchProcessor.addMatchListener(exactListener);
  }
公共作废设置(){
//设置
List exactMatchProperties=new ArrayList();
exactListener=newTestUtils.TestListener();
ExactComparator ExactComparator=新的ExactComparator();
//创建具有名称、比较器和高低阈值的属性(列)。ID属性没有比较器或概率。
添加(新的PropertyImpl(“ID”);
add(newpropertyImpl(“NAME”,exactComparator,0.0,1.0));
添加(新的PropertyImpl(“电子邮件”,exactComparator,0.0,1.0));
//创建新的配置实现
exactMatchConfig=新配置mpl();
//将属性添加到配置
exactMatchConfig.setProperties(exactMatchProperties);
exactMatchConfig.setThreshold(1.0);
exactMatchConfig.setMaybeThreshold(0.0);
//初始化处理器并添加匹配侦听器
exactMatchProcessor=新处理器(exactMatchConfig,true);
exactMatchProcessor.addMatchListener(exactListener);
}
下面是我要测试的函数:

 public void testExactMatch() {
  Collection<Record> records = new ArrayList<Record>();
  Record rec1 = TestUtils.makeRecord(new String[] { "ID", "1", "NAME", "Jon", "EMAIL", "jon@doe.com" });
  Record rec2 = TestUtils.makeRecord(new String[] { "ID", "1", "NAME", "Jon", "EMAIL", "jon@doe.com" });

  records.add(rec1);
  records.add(rec2);

  exactMatchProcessor.deduplicate(records);
  System.out.println(exactListener.getMatches().size());
}
public void testExactMatch(){
集合记录=新的ArrayList();
Record rec1=TestUtils.makeRecord(新字符串[]{“ID”、“1”、“NAME”、“Jon”、“EMAIL”、”jon@doe.com" });
Record rec2=TestUtils.makeRecord(新字符串[]{“ID”、“1”、“NAME”、“Jon”、“EMAIL”、”jon@doe.com" });
记录。添加(记录1);
记录。添加(记录2);
exactMatchProcessor.重复数据消除(记录);
System.out.println(exactListener.getMatches().size());
}
我正在使用API,我已经阅读了上面提到的问题,但这些问题是指XML,而我正在用Java进行测试

getMatches不应该是空的吗?如何获得找到的重复项列表,或相反的列表(唯一记录列表,无重复项)? 谢谢