首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Apache MetaModel -糟糕的性能查询电子表格

Apache MetaModel -糟糕的性能查询电子表格
EN

Stack Overflow用户
提问于 2016-06-29 17:02:02
回答 1查看 679关注 0票数 0

我需要查询Java中的电子表格文件。我在用Apache MetaModel

我用maven导入它

代码语言:javascript
复制
<dependency>
    <groupId>org.apache.metamodel</groupId>
    <artifactId>MetaModel-excel</artifactId>
    <version>4.5.2</version>
</dependency>

一切都很好,但是当next()指令应该返回false时,需要几秒钟时间,为什么?

代码语言:javascript
复制
import org.apache.metamodel.DataContext;
import org.apache.metamodel.excel.ExcelDataContext;
import org.apache.metamodel.schema.Schema;
import org.apache.metamodel.schema.Column; 
import org.apache.metamodel.schema.Table; 
import org.apache.metamodel.query.Query;
import org.apache.metamodel.query.OperatorType; 
import org.apache.metamodel.data.DataSet; 
import org.apache.metamodel.data.Row; 
import org.apache.metamodel.MetaModelException;


public class SpreadsheetReader {

    private File spreadsheet;


    public SpreadsheetReader(String spreadsheetLocation, String spreadsheetName){

        this.spreadsheet = new File( spreadsheetLocation + spreadsheetName );

        if( !"OK".equals( checkSpreadSheet() ) ){
            throw new IllegalStateException("Error in spreadsheet. Cause: "+spreadsheetStatus);
        }

    }


    /** query the excel spreadsheet for the given  ID
    */
    public List<String> query( String givenProgId ){

        List<String> linksArray = new ArrayList<String>();
        int rowCount = 0;

        ExcelConfiguration conf = new ExcelConfiguration( 1, true, true ); // columnNameLineNumber, skipEmptyLines, sEColumns
        DataContext dataContext = new ExcelDataContext( this.spreadsheet, conf );

        System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" START-1" ); // ## 

        Schema schema = dataContext.getDefaultSchema();

        System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" STOP-1" ); // ## 
       // Takes 2 seconds. Will be moved into constructor.

        Table table = schema.getTables()[0];

        Column idsColumn = table.getColumnByName("ProgID");
        Column titlesColumn = table.getColumnByName("Titles");

        Query query = new Query().select(titlesColumn)
                                 .from(table)
                                 .where(idsColumn, OperatorType.EQUALS_TO, givenProgId);

        try( DataSet dataSet = dataContext.executeQuery(query) ){ // try-with-resource, no need to close dataset

            while (dataSet.next()) {

                // the rows are read quite quickly, problem will be when next() is false

                ++rowCount;

                Row currentRow = dataSet.getRow();
                String currentTitle = (String)currentRow.getValue(0);

                linksArray.add( "my-service/titles/"+currentTitle );

                System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" START-2" ); // @@@@@@@
            }
            System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" STOP-2" ); // @@@@@@@ 
            // TAKES ABOUT 6 SECONDS - (Excel file has just 14.779 rows and 114 columns)

        }catch(MetaModelException xx){
            //logger
            throw xx;
        }

        return linksArray;
    }
}}

UPDATE:使用只有3个条目的电子表格文档进行更多的分析()

现在的代码是:

代码语言:javascript
复制
try( DataSet dataSet = this.dataContext.executeQuery(query) ){


    // FIRST NEXT() with result => quite fast

    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" START a\n" );
    System.out.println( "\n 88888 NEXT >>> "+(dataSet.next())+" <<<< \n" ); 
    Row currentRow = dataSet.getRow();
    String currentTitle = (String)currentRow.getValue(0);
    System.out.println( "\n READ: "+(new org.joda.time.DateTime())+" >>> "+currentTitle+" \n" );

    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" STOP a\n" );


    // SECOND AND LAST NEXT() => very SLOW

    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" START b\n" );
    System.out.println( "\n 88888 NEXT >>> "+(dataSet.next())+" <<<< \n" );
    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" STOP b\n" );

}

电子表格是预加载在类构造函数中的。

后续一系列相同查询中的最后一个日志(有时间)如下:

代码语言:javascript
复制
Jun 30, 2016 10:59:38 AM log my-project.logging.ITVLogger
INFO: CODE00012 - Query on spreadsheet started for ID 123456



PROFILING >>> 2016-06-30T10:59:38.651+01:00 START a

10:59:38.652 [main] INFO  o.a.m.d.RowPublisherDataSet - 
Starting separate thread for publishing action: org.apache.metamodel.excel.XlsxRowPublisherAction@4977e527

88888 NEXT >>> true <<<< 

READ: 2016-06-30T10:59:39.756+01:00 >>> A_TITLE

PROFILING >>> 2016-06-30T10:59:39.756+01:00 STOP a



PROFILING >>> 2016-06-30T10:59:39.756+01:00 START b

88888 NEXT >>> false <<<< 

PROFILING >>> 2016-06-30T10:59:44.735+01:00 STOP b

因此,回顾一下,检索结果需要大约1秒的时间,最后一次执行next()需要4~6秒。

EN

回答 1

Stack Overflow用户

发布于 2016-06-29 19:35:25

excel DataContext实现在后台对压缩的.xlsx文件进行展开和解析。这意味着DataContext.executeQuery(...)方法返回得很快,但是在它必须等待数据在内存中可用之后才会调用DataSet.next()。我认为您无法避免这种情况,这只是Excel文件需要处理的相当复杂的结果。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/38105932

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档