这是我关于weka使用的第二个帖子(第一个帖子是here)。我使用TextDirectoryLoader成功地为Weka提供了训练和样本测试数据。效果很好。现在,我想将其转移到生产环境中,以便从mysql表中检索要分类的数据。我是这样做的:
TextDirectoryLoader loader = new TextDirectoryLoader();
loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/training-data"));
Instances dataRaw = loader.getDataSet();
StringToWordVector filter = new StringToWordVector();
filter.setInputFormat(dataRaw);
Instances dataTraining = Filter.useFilter(dataRaw, filter);
// Create test data instances[this works, but the sample data now needs to come frm the db instead, see below]
//loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data"));
//dataRaw = loader.getDataSet();
//Instances dataTest = Filter.useFilter(dataRaw, filter);
InstanceQuery query = new InstanceQuery();
query.setUsername("myusername");
query.setPassword("mypassword");
String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1";
query.setQuery(sql);
Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter);
// Classify
J48 model = new J48();
model.buildClassifier(dataTraining);
for (int i = 0; i < dataTest.numInstances(); i++) {
dataTest.instance(i).setClassMissing();
double cls = model.classifyInstance(dataTest.instance(i));
dataTest.instance(i).setClassValue(cls);
System.out.println(cls + " -> " + dataTest.instance(i).classAttribute().value((int) cls));
}不幸的是,这不起作用,weka在这一行意外停止:
Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter);所以我想我的问题是如何改变这一部分
// Create test data instances[this works, but the sample data now needs to come frm the db instead, see below]
//loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data"));
//dataRaw = loader.getDataSet();
//Instances dataTest = Filter.useFilter(dataRaw, filter);到基于sql的数据
InstanceQuery query = new InstanceQuery();
query.setUsername("myusername");
query.setPassword("mypassword");
String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1";
query.setQuery(sql);
Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter);请注意,数据库连接中没有任何问题,我实际获得了正确的实例数量。
感谢你的帮助,非常接近。
发布于 2013-03-27 15:27:03
您的代码使用基于Arff Files from Text Collections的TextDirectoryLoader类。根据他们的帮助文件
"Loads all text files in a directory and
uses the subdirectory names as class labels.
The content of the text files will be stored in a String attribute,
the filename can be stored as well."请参阅下面的code
double[] newInst = new double[2];
newInst[0] = (double)data.attribute(0).addStringValue(files[i]);
....
newInst[1] = (double)data.attribute(1).addStringValue(txtStr.toString());
data.add(new Instance(1.0, newInst));正如您所看到的,这段代码需要2个属性值来添加您的数据集。但是您的sql只提供了一个属性。
String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1";这可能是代码newInst1部件中出现问题"(java.lang.ArrayIndexOutOfBoundsException)“的原因。Weka找不到第二个属性。
发布于 2013-07-18 20:11:01
我自己也是一个初学者,但是为了防止它有用,你知道有一个DatabaseLoader类和一个DatabaseConverter接口吗?
https://stackoverflow.com/questions/15466202
复制相似问题