首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Mallet: OutOfMemoryError: Java堆空间

Mallet: OutOfMemoryError: Java堆空间
EN

Stack Overflow用户
提问于 2017-06-22 11:20:12
回答 0查看 671关注 0票数 2

当在Mallet中训练数据时,由于OutOfMemoryError而停止处理。bin/mallet中的属性MEMORY已设置为3 3GB。训练文件output.mallet的大小只有31MB。我已经尝试减少训练数据的大小。但它仍然抛出相同的错误:

代码语言:javascript
复制
a161115@a161115-Inspiron-3250:~/dev/test_models/Mallet$ bin/mallet train-classifier --input output.mallet --trainer NaiveBayes --training-portion 0.0001 --num-trials 10
Training portion = 1.0E-4
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.9999

-------------------- Trial 0  --------------------

Trial 0 Training NaiveBayesTrainer with 7 instances
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:309)
        at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251)
        at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200)
        at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193)
        at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59)
        at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:415)

我希望对这个问题有任何帮助或见解。

编辑:这是我的bin/mallet文件。

代码语言:javascript
复制
#!/bin/bash


malletdir=`dirname $0`
malletdir=`dirname $malletdir`

cp=$malletdir/class:$malletdir/lib/mallet-deps.jar:$CLASSPATH
#echo $cp

MEMORY=10g

CMD=$1
shift

help()
{
cat <<EOF
Mallet 2.0 commands: 

  import-dir         load the contents of a directory into mallet instances (one per file)
  import-file        load a single file into mallet instances (one per line)
  import-svmlight    load SVMLight format data files into Mallet instances
  info               get information about Mallet instances
  train-classifier   train a classifier from Mallet data files
  classify-dir       classify data from a single file with a saved classifier
  classify-file      classify the contents of a directory with a saved classifier
  classify-svmlight  classify data from a single file in SVMLight format
  train-topics       train a topic model from Mallet data files
  infer-topics       use a trained topic model to infer topics for new documents
  evaluate-topics    estimate the probability of new documents under a trained model
  prune              remove features based on frequency or information gain
  split              divide data into testing, training, and validation portions
  bulk-load          for big input files, efficiently prune vocabulary and import docs

Include --help with any option for more information
EOF
}

CLASS=

case $CMD in
        import-dir) CLASS=cc.mallet.classify.tui.Text2Vectors;;
        import-file) CLASS=cc.mallet.classify.tui.Csv2Vectors;;
        import-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Vectors;;
        info) CLASS=cc.mallet.classify.tui.Vectors2Info;;
        train-classifier) CLASS=cc.mallet.classify.tui.Vectors2Classify;;
        classify-dir) CLASS=cc.mallet.classify.tui.Text2Classify;;
        classify-file) CLASS=cc.mallet.classify.tui.Csv2Classify;;
        classify-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Classify;;
        train-topics) CLASS=cc.mallet.topics.tui.TopicTrainer;;
        infer-topics) CLASS=cc.mallet.topics.tui.InferTopics;;
        evaluate-topics) CLASS=cc.mallet.topics.tui.EvaluateTopics;;
        prune) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
        split) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
        bulk-load) CLASS=cc.mallet.util.BulkLoader;;
        run) CLASS=$1; shift;;
        *) echo "Unrecognized command: $CMD"; help; exit 1;;
esac

java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$@"

值得一提的是,我的原始培训文件有60,000个项目。当我减少项目(20,000个实例)的数量时,训练将像正常一样运行,但使用了大约10 of的RAM。

EN

回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44689581

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档