Web crawler 统计数据库在Crawler4j开源代码中做什么?
我正在尝试理解Crawler4j开源网络爬虫。同时我也有一些疑问,如下所示 问题:-Web crawler 统计数据库在Crawler4j开源代码中做什么?,web-crawler,crawler4j,Web Crawler,Crawler4j,我正在尝试理解Crawler4j开源网络爬虫。同时我也有一些疑问,如下所示 问题:- 统计数据库在计数器类中做什么,请解释以下代码部分 public Counters(Environment env, CrawlConfig config) throws DatabaseException { super(config); this.env = env; this.counterValues = new HashMap<String, Long>();
public Counters(Environment env, CrawlConfig config) throws DatabaseException {
super(config);
this.env = env;
this.counterValues = new HashMap<String, Long>();
/*
* When crawling is set to be resumable, we have to keep the statistics
* in a transactional database to make sure they are not lost if crawler
* is crashed or terminated unexpectedly.
*/
if (config.isResumableCrawling()) {
DatabaseConfig dbConfig = new DatabaseConfig();
dbConfig.setAllowCreate(true);
dbConfig.setTransactional(true);
dbConfig.setDeferredWrite(false);
statisticsDB = env.openDatabase(null, "Statistics", dbConfig);
OperationStatus result;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry value = new DatabaseEntry();
Transaction tnx = env.beginTransaction(null, null);
Cursor cursor = statisticsDB.openCursor(tnx, null);
result = cursor.getFirst(key, value, null);
while (result == OperationStatus.SUCCESS) {
if (value.getData().length > 0) {
String name = new String(key.getData());
long counterValue = Util.byteArray2Long(value.getData());
counterValues.put(name, counterValue);
}
result = cursor.getNext(key, value, null);
}
cursor.close();
tnx.commit();
}
}
公共计数器(环境环境,爬网配置)引发DatabaseException{
超级(配置);
this.env=env;
this.counterValues=new HashMap();
/*
*当爬行设置为可恢复时,我们必须保留统计数据
*在事务数据库中,以确保爬虫程序
*意外崩溃或终止。
*/
if(config.isResumableScrawling()){
DatabaseConfig dbConfig=新建DatabaseConfig();
dbConfig.setAllowCreate(true);
dbConfig.setTransactional(true);
dbConfig.setDeferredWrite(false);
statisticsDB=env.openDatabase(null,“Statistics”,dbConfig);
操作状态结果;
DatabaseEntry key=新建DatabaseEntry();
DatabaseEntry值=新建DatabaseEntry();
事务tnx=env.beginTransaction(null,null);
Cursor Cursor=statisticsDB.openCursor(tnx,null);
结果=cursor.getFirst(键,值,null);
while(result==OperationStatus.SUCCESS){
如果(value.getData().length>0){
字符串名称=新字符串(key.getData());
long counterValue=Util.byteArray2Long(value.getData());
counterValues.put(名称,counterValue);
}
结果=cursor.getNext(键、值、空);
}
cursor.close();
提交();
}
}
请帮帮我。正在寻找您的回复。基本上,Crawler4j通过从数据库加载所有值,从数据库加载现有统计信息。 事实上,代码非常不正确,因为打开了一个事务,并且没有对数据库进行任何修改。因此,可以删除处理tnx的线路 逐行注释:
//Create a database configuration object
DatabaseConfig dbConfig = new DatabaseConfig();
//Set some parameters : allow creation, set to transactional db and don't use deferred write
dbConfig.setAllowCreate(true);
dbConfig.setTransactional(true);
dbConfig.setDeferredWrite(false);
//Open the database called "Statistics" with the upon created configuration
statisticsDB = env.openDatabase(null, "Statistics", dbConfig);
OperationStatus result;
//Create new database entries key and values
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry value = new DatabaseEntry();
//Start a transaction
Transaction tnx = env.beginTransaction(null, null);
//Get the cursor on the DB
Cursor cursor = statisticsDB.openCursor(tnx, null);
//Position the cursor to the first occurrence of key/value
result = cursor.getFirst(key, value, null);
//While result is success
while (result == OperationStatus.SUCCESS) {
//If the value at the current cursor position is not null, get the name and the value of the counter and add it to the Hashmpa countervalues
if (value.getData().length > 0) {
String name = new String(key.getData());
long counterValue = Util.byteArray2Long(value.getData());
counterValues.put(name, counterValue);
}
result = cursor.getNext(key, value, null);
}
cursor.close();
//Commit the transaction, changes will be operated on th DB
tnx.commit();
我还回答了一个类似的问题。
关于SleepyCat,你在说什么?基本上,Crawler4j通过从数据库加载所有值,从数据库加载现有统计信息。 事实上,代码非常不正确,因为打开了一个事务,并且没有对数据库进行任何修改。因此,可以删除处理tnx的线路 逐行注释:
//Create a database configuration object
DatabaseConfig dbConfig = new DatabaseConfig();
//Set some parameters : allow creation, set to transactional db and don't use deferred write
dbConfig.setAllowCreate(true);
dbConfig.setTransactional(true);
dbConfig.setDeferredWrite(false);
//Open the database called "Statistics" with the upon created configuration
statisticsDB = env.openDatabase(null, "Statistics", dbConfig);
OperationStatus result;
//Create new database entries key and values
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry value = new DatabaseEntry();
//Start a transaction
Transaction tnx = env.beginTransaction(null, null);
//Get the cursor on the DB
Cursor cursor = statisticsDB.openCursor(tnx, null);
//Position the cursor to the first occurrence of key/value
result = cursor.getFirst(key, value, null);
//While result is success
while (result == OperationStatus.SUCCESS) {
//If the value at the current cursor position is not null, get the name and the value of the counter and add it to the Hashmpa countervalues
if (value.getData().length > 0) {
String name = new String(key.getData());
long counterValue = Util.byteArray2Long(value.getData());
counterValues.put(name, counterValue);
}
result = cursor.getNext(key, value, null);
}
cursor.close();
//Commit the transaction, changes will be operated on th DB
tnx.commit();
我还回答了一个类似的问题。
关于SleepyCat,你在说什么?如果它回答了你的问题,请向上投票/接受question@JulienS. 它回答了我的问题。如果它回答了你的问题,请投赞成票/接受question@JulienS. 它回答了我的问题。