我需要查询Java中的电子表格文件。我在用Apache MetaModel。
我用maven导入它
<dependency>
<groupId>org.apache.metamodel</groupId>
<artifactId>MetaModel-excel</artifactId>
<version>4.5.2</version>
</dependency>一切都很好,但是当next()指令应该返回false时,需要几秒钟时间,为什么?
import org.apache.metamodel.DataContext;
import org.apache.metamodel.excel.ExcelDataContext;
import org.apache.metamodel.schema.Schema;
import org.apache.metamodel.schema.Column;
import org.apache.metamodel.schema.Table;
import org.apache.metamodel.query.Query;
import org.apache.metamodel.query.OperatorType;
import org.apache.metamodel.data.DataSet;
import org.apache.metamodel.data.Row;
import org.apache.metamodel.MetaModelException;
public class SpreadsheetReader {
private File spreadsheet;
public SpreadsheetReader(String spreadsheetLocation, String spreadsheetName){
this.spreadsheet = new File( spreadsheetLocation + spreadsheetName );
if( !"OK".equals( checkSpreadSheet() ) ){
throw new IllegalStateException("Error in spreadsheet. Cause: "+spreadsheetStatus);
}
}
/** query the excel spreadsheet for the given ID
*/
public List<String> query( String givenProgId ){
List<String> linksArray = new ArrayList<String>();
int rowCount = 0;
ExcelConfiguration conf = new ExcelConfiguration( 1, true, true ); // columnNameLineNumber, skipEmptyLines, sEColumns
DataContext dataContext = new ExcelDataContext( this.spreadsheet, conf );
System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" START-1" ); // ##
Schema schema = dataContext.getDefaultSchema();
System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" STOP-1" ); // ##
// Takes 2 seconds. Will be moved into constructor.
Table table = schema.getTables()[0];
Column idsColumn = table.getColumnByName("ProgID");
Column titlesColumn = table.getColumnByName("Titles");
Query query = new Query().select(titlesColumn)
.from(table)
.where(idsColumn, OperatorType.EQUALS_TO, givenProgId);
try( DataSet dataSet = dataContext.executeQuery(query) ){ // try-with-resource, no need to close dataset
while (dataSet.next()) {
// the rows are read quite quickly, problem will be when next() is false
++rowCount;
Row currentRow = dataSet.getRow();
String currentTitle = (String)currentRow.getValue(0);
linksArray.add( "my-service/titles/"+currentTitle );
System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" START-2" ); // @@@@@@@
}
System.out.println( "PROFILING >>> "+(new org.joda.time.DateTime())+" STOP-2" ); // @@@@@@@
// TAKES ABOUT 6 SECONDS - (Excel file has just 14.779 rows and 114 columns)
}catch(MetaModelException xx){
//logger
throw xx;
}
return linksArray;
}
}}UPDATE:使用只有3个条目的电子表格文档进行更多的分析()
现在的代码是:
try( DataSet dataSet = this.dataContext.executeQuery(query) ){
// FIRST NEXT() with result => quite fast
System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" START a\n" );
System.out.println( "\n 88888 NEXT >>> "+(dataSet.next())+" <<<< \n" );
Row currentRow = dataSet.getRow();
String currentTitle = (String)currentRow.getValue(0);
System.out.println( "\n READ: "+(new org.joda.time.DateTime())+" >>> "+currentTitle+" \n" );
System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" STOP a\n" );
// SECOND AND LAST NEXT() => very SLOW
System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" START b\n" );
System.out.println( "\n 88888 NEXT >>> "+(dataSet.next())+" <<<< \n" );
System.out.println( "\n PROFILING >>> "+(new org.joda.time.DateTime())+" STOP b\n" );
}电子表格是预加载在类构造函数中的。
后续一系列相同查询中的最后一个日志(有时间)如下:
Jun 30, 2016 10:59:38 AM log my-project.logging.ITVLogger
INFO: CODE00012 - Query on spreadsheet started for ID 123456
PROFILING >>> 2016-06-30T10:59:38.651+01:00 START a
10:59:38.652 [main] INFO o.a.m.d.RowPublisherDataSet -
Starting separate thread for publishing action: org.apache.metamodel.excel.XlsxRowPublisherAction@4977e527
88888 NEXT >>> true <<<<
READ: 2016-06-30T10:59:39.756+01:00 >>> A_TITLE
PROFILING >>> 2016-06-30T10:59:39.756+01:00 STOP a
PROFILING >>> 2016-06-30T10:59:39.756+01:00 START b
88888 NEXT >>> false <<<<
PROFILING >>> 2016-06-30T10:59:44.735+01:00 STOP b因此,回顾一下,检索结果需要大约1秒的时间,最后一次执行next()需要4~6秒。
发布于 2016-06-29 19:35:25
excel DataContext实现在后台对压缩的.xlsx文件进行展开和解析。这意味着DataContext.executeQuery(...)方法返回得很快,但是在它必须等待数据在内存中可用之后才会调用DataSet.next()。我认为您无法避免这种情况,这只是Excel文件需要处理的相当复杂的结果。
https://stackoverflow.com/questions/38105932
复制相似问题