文章/答案/技术大牛

发布

社区首页 >问答首页 >Apache MetaModel -糟糕的性能查询电子表格

问Apache MetaModel -糟糕的性能查询电子表格
EN

Stack Overflow用户

提问于 2016-06-29 17:02:02

回答 1查看 679关注 0票数 0

我需要查询Java中的电子表格文件。我在用Apache MetaModel。

我用maven导入它

<dependency>
    <groupId>org.apache.metamodel</groupId>
    <artifactId>MetaModel-excel</artifactId>
    <version>4.5.2</version>
</dependency>

一切都很好，但是当next()指令应该返回false时，需要几秒钟时间，为什么？

import org.apache.metamodel.DataContext;
import org.apache.metamodel.excel.ExcelDataContext;
import org.apache.metamodel.schema.Schema;
import org.apache.metamodel.schema.Column; 
import org.apache.metamodel.schema.Table; 
import org.apache.metamodel.query.Query;
import org.apache.metamodel.query.OperatorType; 
import org.apache.metamodel.data.DataSet; 
import org.apache.metamodel.data.Row; 
import org.apache.metamodel.MetaModelException;


public class SpreadsheetReader {

    private File spreadsheet;


    public SpreadsheetReader(String spreadsheetLocation, String spreadsheetName){

        this.spreadsheet = new File( spreadsheetLocation + spreadsheetName );

        if( !"OK".equals( checkSpreadSheet() ) ){
            throw new IllegalStateException("Error in spreadsheet. Cause: "+spreadsheetStatus);
        }

    }


    /** query the excel spreadsheet for the given  ID
    */
    public List<String> query( String givenProgId ){

        List<String> linksArray = new ArrayList<String>();
        int rowCount = 0;

        ExcelConfiguration conf = new ExcelConfiguration( 1, true, true ); // columnNameLineNumber, skipEmptyLines, sEColumns
        DataContext dataContext = new ExcelDataContext( this.spreadsheet, conf );

        System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" START-1" ); // ## 

        Schema schema = dataContext.getDefaultSchema();

        System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" STOP-1" ); // ## 
       // Takes 2 seconds. Will be moved into constructor.

        Table table = schema.getTables()[0];

        Column idsColumn = table.getColumnByName("ProgID");
        Column titlesColumn = table.getColumnByName("Titles");

        Query query = new Query().select(titlesColumn)
                                 .from(table)
                                 .where(idsColumn, OperatorType.EQUALS_TO, givenProgId);

        try( DataSet dataSet = dataContext.executeQuery(query) ){ // try-with-resource, no need to close dataset

            while (dataSet.next()) {

                // the rows are read quite quickly, problem will be when next() is false

                ++rowCount;

                Row currentRow = dataSet.getRow();
                String currentTitle = (String)currentRow.getValue(0);

                linksArray.add( "my-service/titles/"+currentTitle );

                System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" START-2" ); // @@@@@@@
            }
            System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" STOP-2" ); // @@@@@@@ 
            // TAKES ABOUT 6 SECONDS - (Excel file has just 14.779 rows and 114 columns)

        }catch(MetaModelException xx){
            //logger
            throw xx;
        }

        return linksArray;
    }
}}

UPDATE：使用只有3个条目的电子表格文档进行更多的分析()

现在的代码是：

try( DataSet dataSet = this.dataContext.executeQuery(query) ){


    // FIRST NEXT() with result => quite fast

    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" START a\n" );
    System.out.println( "\n 88888 NEXT >>> "+(dataSet.next())+" <<<< \n" ); 
    Row currentRow = dataSet.getRow();
    String currentTitle = (String)currentRow.getValue(0);
    System.out.println( "\n READ: "+(new org.joda.time.DateTime())+" >>> "+currentTitle+" \n" );

    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" STOP a\n" );


    // SECOND AND LAST NEXT() => very SLOW

    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" START b\n" );
    System.out.println( "\n 88888 NEXT >>> "+(dataSet.next())+" <<<< \n" );
    System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" STOP b\n" );

}

电子表格是预加载在类构造函数中的。

后续一系列相同查询中的最后一个日志(有时间)如下：

Jun 30, 2016 10:59:38 AM log my-project.logging.ITVLogger
INFO: CODE00012 - Query on spreadsheet started for ID 123456



PROFILING >>> 2016-06-30T10:59:38.651+01:00 START a

10:59:38.652 [main] INFO  o.a.m.d.RowPublisherDataSet - 
Starting separate thread for publishing action: org.apache.metamodel.excel.XlsxRowPublisherAction@4977e527

88888 NEXT >>> true <<<< 

READ: 2016-06-30T10:59:39.756+01:00 >>> A_TITLE

PROFILING >>> 2016-06-30T10:59:39.756+01:00 STOP a



PROFILING >>> 2016-06-30T10:59:39.756+01:00 START b

88888 NEXT >>> false <<<< 

PROFILING >>> 2016-06-30T10:59:44.735+01:00 STOP b

因此，回顾一下，检索结果需要大约1秒的时间，最后一次执行next()需要4~6秒。

apache-metamodel

java

spreadsheet

回答 1

Stack Overflow用户

发布于 2016-06-29 19:35:25

excel DataContext实现在后台对压缩的.xlsx文件进行展开和解析。这意味着DataContext.executeQuery(...)方法返回得很快，但是在它必须等待数据在内存中可用之后才会调用DataSet.next()。我认为您无法避免这种情况，这只是Excel文件需要处理的相当复杂的结果。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/38105932

复制

相似问题

问Apache MetaModel -糟糕的性能查询电子表格
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Apache MetaModel -糟糕的性能查询电子表格EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Apache MetaModel -糟糕的性能查询电子表格
EN