我正在尝试理解Crawler4j开源网络爬虫。虽然我有一些疑问,如下所示,
问题:-
据我所知,它保存了爬行URLS,这有助于在爬虫崩溃时,那么网络爬虫不需要从一开始就开始。请您逐行解释上面的代码,请.
2.我没有找到任何向我解释SleepyCat的好链接,因为Crawlers4j使用SleepyCat存储中间信息。所以请告诉我一些好的资源,在那里我可以学习SleepyCat的基础知识。(我不知道上面代码中使用的事务、游标的含义)。
请帮帮我。期待您的好意答复。
发布于 2013-06-07 08:54:07
基本上,Crawler4j通过从DB加载所有值,从数据库加载现有的统计信息。实际上,代码是非常不正确的,因为一个事务是打开的,并且没有对DB进行任何修改。因此,可以删除处理tnx的行。
评论逐行:
//Create a database configuration object
DatabaseConfig dbConfig = new DatabaseConfig();
//Set some parameters : allow creation, set to transactional db and don't use deferred write
dbConfig.setAllowCreate(true);
dbConfig.setTransactional(true);
dbConfig.setDeferredWrite(false);
//Open the database called "Statistics" with the upon created configuration
statisticsDB = env.openDatabase(null, "Statistics", dbConfig);
OperationStatus result;
//Create new database entries key and values
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry value = new DatabaseEntry();
//Start a transaction
Transaction tnx = env.beginTransaction(null, null);
//Get the cursor on the DB
Cursor cursor = statisticsDB.openCursor(tnx, null);
//Position the cursor to the first occurrence of key/value
result = cursor.getFirst(key, value, null);
//While result is success
while (result == OperationStatus.SUCCESS) {
//If the value at the current cursor position is not null, get the name and the value of the counter and add it to the Hashmpa countervalues
if (value.getData().length > 0) {
String name = new String(key.getData());
long counterValue = Util.byteArray2Long(value.getData());
counterValues.put(name, counterValue);
}
result = cursor.getNext(key, value, null);
}
cursor.close();
//Commit the transaction, changes will be operated on th DB
tnx.commit();我还回答了一个类似的问题,这里。关于SleepyCat,你是在说这吗?
https://stackoverflow.com/questions/16608790
复制相似问题