文章/答案/技术大牛

发布

社区首页 >问答首页 >arff背后的概念以及如何在java中阅读weka arff？

问arff背后的概念以及如何在java中阅读weka arff？
EN

Stack Overflow用户

提问于 2015-04-01 08:24:23

回答 2查看 6.9K关注 0票数 2

为什么有人要用arff？请给出一个读取arff文件并在java中使用它的示例代码。

我在weka站点上找到了以下代码片段：

BufferedReader reader =
new BufferedReader(new FileReader("/some/where/file.arff"));
ArffReader arff = new ArffReader(reader);
Instances data = arff.getData();
data.setClassIndex(data.numAttributes() - 1);

那之后呢？有人能解释一下上面是怎么回事吗？如何从文件中访问我的数据？weka网站提到了两种不同的用法，即批处理和增量。这两者有什么不同？

java

weka

arff

回答 2

Stack Overflow用户

发布于 2015-04-01 22:16:55

嗯，通常有人会使用arff，因为它是一种非常简单的文件格式，基本上是一个带有描述数据的头的csv文件，这是使用Weka保存/读取数据的常见方式。

读取arff文件的示例代码正是您提供的代码，如果您想要使用加载的实例，则应使用您的数据。要打印它们：System.out.println(data);您可以查看大量关于如何使用数据(分类、聚类等) here的示例。

您正在使用的代码将arff文件加载到标准BufferedReader中，然后创建一个ArffReader实例(arff)，该实例完全从读取器读取数据，然后使用getData方法返回Instances对象(称为data)中的数据。最后，设置哪个属性是类( arff文件中的最后一个属性)。

如果要迭代Instances对象并检索每个实例：

for (int i = 0; i <= data.numInstances - 1; i++) {
    Instance instance = data.getInstance(i);
    System.out.println(instance.stringValue(0)); //get Attribute 0 as String
}

您正在谈论从arff文件中批量和增量读取。批处理模式完全读取arff文件，而增量模式使您有机会读取arff文件的每个实例(行)并手动添加它。

增量模式的代码：

 BufferedReader reader =
   new BufferedReader(new FileReader("/some/where/file.arff"));
 ArffReader arff = new ArffReader(reader, 1000);
 Instances data = arff.getStructure();
 data.setClassIndex(data.numAttributes() - 1);
 Instance inst;
 while ((inst = arff.readInstance(data)) != null) {
   data.add(inst);
 }

票数 1

Stack Overflow用户

发布于 2017-04-12 16:34:00

我发现很难构建一个阅读器，因为缺乏示例和非常模棱两可的javadoc。这是我写的一小段代码，用来读取经过测试和工作的数值属性之间的关系！

BufferedReader reader = new BufferedReader(new FileReader(new File(path)));
ArffReader arff = new ArffReader(reader, 1000);         
Instances data = arff.getStructure();
data.setClassIndex(0);

Instance inst;
while ((inst = arff.readInstance(data)) != null) {          
    // the first attribute is ignored because it is the index
    for(int i = 1 ; i < inst.numAttributes() ; i++) {
        switch(inst.attribute(index).type()) {
        case Attribute.NUMERIC :
            System.out.println(inst.value(index));
        case Attribute.STRING :
            System.out.println(inst.stringValue(index));
        case Attribute.RELATIONAL : 
            // test if we have an imbrication of two relations or not
            if (inst.attribute(index).relation().numAttributes() > 0 &&
                    inst.attribute(index).relation().attribute(0).isRelationValued()) {
                    inst.attribute(index).relation().attribute(0).isRelationValued()) {
                // case of an array of int arrays
                double[][] seq = new double[inst.attribute(index).relation().numAttributes()][];
                for (int i = 0 ; i < inst.attribute(index).relation().numAttributes() ; i++) {
                    Instances instances = inst.relationalValue(index);
                    seq[i] = new double[instances.attribute(0).relation().numAttributes()];

                    Instance q = instances.instance(0).relationalValue(i).get(0);
                    for(int j = 0 ; j < instances.attribute(0).relation().numAttributes() ; j++) {
                        seq[i][j] = q.value(j);

                    }
                }
                System.out.println(seq);
            } else {
                // case wit only an arry of int
                double[] seq = new double[inst.attribute(index).relation().numAttributes()];
                for (int i = 0 ; i < inst.attribute(index).relation().numAttributes() ; i++) {
                        seq[i] = inst.value(i);
                }
                System.out.println(seq);
            }
        }
    }               

    System.out.println("index is : "+((int) inst.value(0)));
}

下面是数据的样子，每个元素都由一个索引和一对数字三元组组成：

@relation 'name of relation'

@attribute index numeric
@attribute attr1 relational
@attribute attr1.0 relational
@attribute attr1.0.0 numeric
@attribute attr1.0.1 numeric
@attribute attr1.0.2 numeric
@end attr1.0
@attribute attr1.1 relational
@attribute attr1.1.0 numeric
@attribute attr1.1.1 numeric
@attribute attr1.1.2 numeric
@end attr1.1
@end attr1

@data
0,'\'23,25,48\',\'12,0,21\''
115260,'\'34,44,72\',\'15,8,32\''
230520,'\'175,247,244\',\'107,185,239\''
345780,'\'396,269,218\',\'414,276,228\''
461040,'\'197,38,42\',\'227,40,43\''

希望这能帮助到一些人

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/29380820

复制

相似问题

问arff背后的概念以及如何在java中阅读weka arff？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问arff背后的概念以及如何在java中阅读weka arff？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问arff背后的概念以及如何在java中阅读weka arff？
EN