首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Weka ARFF生成

Weka ARFF生成
EN

Stack Overflow用户
提问于 2012-11-14 04:41:24
回答 2查看 1.4K关注 0票数 0

我正在尝试从我拥有的csv数据文件生成.arff文件。现在我对Weka完全陌生,一天前就开始使用它了。首先,我正在尝试一个简单的twitter情绪分析。我已经在CSV中生成了训练数据。CSV文件的内容如下:

代码语言:javascript
复制
  tweet,affinScore,polarity
 ATAUTHORcfoblog is giving away a $25 Amex gift card (enter to win over $600 in prizes!) http://t.co/JD8EP14c ,4,4
"American Express has always been my dark horse acquirer of  ATAUTHORFoursquare. Bundle in Square-like payments & its a lite-retailer platform, no? ",0,1
African-American Demos Express Ethnic Identity Differently http://t.co/gInv4bKj via  ATAUTHORmediapost ,0,3
Google ???????? Visa ? American Express  http://t.co/eEZTSiHY ,0,4
Secrets to Success from Small-Business Owners : Lifestyle :: American Express OPEN Forum http://t.co/b85F8JX0 via  ATAUTHOROpenForum ,2,1
RT  ATAUTHORhunterwalk: American Express has always been my dark horse acquirer of  ATAUTHORFoursquare. Bundle in Square-like payments & its a lite ... ,0,1
Winning Surveys $1500 american express Huggies Sweeps http://t.co/WoaTFowp ,4,1
I root for Square mostly because a small business that takes Square is also one that takes American Express. ,0,1
I dont know how bitch be acting American Express but they cards be saying DEBIT ON IT HAVE A ?? PLEASE!!! ,-5,2
Uh oh... RT  ATAUTHORBlackArrowBella: I dont know how bitch be acting American Express but they cards be saying DEBIT ON IT HAVE A ?? PLEASE!!! ,-5,2
Just got another credit card. A Blue Sky card with American Express. Its gonna help pay for the honeymoon!  ATAUTHORAmericanExpress ,-1,1
Follow  ATAUTHORShaveMagazine and ReTweet this msg to be entered to #Win an American Express Gift card. Winners contacted bi-weekly by direct msg! ,2,4
American Express Gold zakelijk aanvragen: http://t.co/xheZwmbt ,0,3
RT  ATAUTHORhunterwalk: American Express has always been my dark horse acquirer of  ATAUTHORFoursquare. Bundle in Square-like payments & its a lite ... ,0,1

这里的第一个属性是实际的推文,第二个是亲和得分,第三个是实际的分类类别(1-正面,2-负面,3-中性,4-垃圾邮件)

现在,我尝试使用代码从它生成.arff格式:

代码语言:javascript
复制
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;

import java.io.File;

public class CSV2Arff {
  /**
   * takes 2 arguments:
   * - CSV input file
   * - ARFF output file
   */
  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.out.println("\nUsage: CSV2Arff <input.csv> <output.arff>\n");
      System.exit(1);
    }

    // load CSV
    CSVLoader loader = new CSVLoader();
    loader.setSource(new File(args[0]));
    Instances data = loader.getDataSet();

    // save ARFF
    ArffSaver saver = new ArffSaver();
    saver.setInstances(data);
    saver.setFile(new File(args[1]));
    saver.setDestination(new File(args[1]));
    saver.writeBatch();
  }
}

这将生成如下所示的.arff文件:

代码语言:javascript
复制
   @relation file

@attribute tweet {_ATAUTHORcfoblog_is_giving_away_a_$25_Amex_gift_card_(enter_to_win_over_$600_in_prizes!)_http://t.co/JD8EP14c_,'American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite-retailer_platform,_no?_',African-American_Demos_Express_Ethnic_Identity_Differently_http://t.co/gInv4bKj_via__ATAUTHORmediapost_,Google_????????_Visa_?_American_Express__http://t.co/eEZTSiHY_,Secrets_to_Success_from_Small-Business_Owners_:_Lifestyle_::_American_Express_OPEN_Forum_http://t.co/b85F8JX0_via__ATAUTHOROpenForum_,RT__ATAUTHORhunterwalk:_American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite_..._

@data
_ATAUTHORcfoblog_is_giving_away_a_$25_Amex_gift_card_(enter_to_win_over_$600_in_prizes!)_http://t.co/JD8EP14c_,4,4
'American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite-retailer_platform,_no?_',0,1
African-American_Demos_Express_Ethnic_Identity_Differently_http://t.co/gInv4bKj_via__ATAUTHORmediapost_,0,3
Google_????????_Visa_?_American_Express__http://t.co/eEZTSiHY_,0,4
Secrets_to_Success_from_Small-Business_Owners_:_Lifestyle_::_American_Express_OPEN_Forum_http://t.co/b85F8JX0_via__ATAUTHOROpenForum_,2,1
RT__ATAUTHORhunterwalk:_American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite_..._,0,1

我是Weka的新手,但从我所读到的内容来看,我怀疑这个ARFF是不正确的。有人能对此发表评论吗?

另外,如果它是错的,有人能告诉我我到底哪里错了吗?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2012-11-14 15:24:43

确保将tweet属性的类型设置为任意字符串,而不是类别属性,后者似乎是默认属性。这不能很好地扩展,因为它将每个tweet的副本放在类型定义中。

请注意,对于tweet内容的实际分析,您可能需要进一步对其进行预处理。您可能需要文本的稀疏向量表示,而不是长字符串。

票数 0
EN

Stack Overflow用户

发布于 2012-11-15 22:34:40

如果您使用的是前面提到的UI,那么您可以直接将文件加载到Weka中。

如果您只想基于CSV文件生成一个ARFF文件,您可以执行以下操作。这是取自CSV2Arff工具,该工具是Weka的一部分。

代码语言:javascript
复制
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import java.io.File;

public class CSV2Arff {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
  System.out.println("\nUsage: CSV2Arff <input.csv> <output.arff>\n");
  System.exit(1);
}

// load CSV
CSVLoader loader = new CSVLoader();
loader.setSource(new File(args[0]));
Instances data = loader.getDataSet();

// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File(args[1]));
saver.setDestination(new File(args[1]));
saver.writeBatch();
}
} 
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/13368424

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档