我需要解析一个大的csv文件(2gb)。必须验证这些值,必须删除包含“坏”字段的行,并且应该输出一个只包含有效行的新文件。
为此,我选择了uniVocity解析器库。请帮助我理解这个库是否适合这个任务,以及应该使用什么方法。
发布于 2015-12-15 08:40:37
我是这个图书馆的作者,让我试着帮你:
读取/验证/写入的速度更快的方法是使用具有RowProcessor并决定何时写入或跳过行的CsvWriter。我认为以下代码会对您有所帮助:
定义输出:
private CsvWriter createCsvWriter(File output, String encoding){
CsvWriterSettings settings = new CsvWriterSettings();
//configure the writer ...
try {
return new CsvWriter(new OutputStreamWriter(new FileOutputStream(output), encoding), settings);
} catch (IOException e) {
throw new IllegalArgumentException("Error writing to " + output.getAbsolutePath(), e);
}
}重定向输入
//this creates a row processor for our parser. It validates each row and sends them to the csv writer.
private RowProcessor createRowProcessor(File output, String encoding){
final CsvWriter writer = createCsvWriter(output, encoding);
return new AbstractRowProcessor() {
@Override
public void rowProcessed(String[] row, ParsingContext context) {
if (shouldWriteRow(row)) {
writer.writeRow(row);
} else {
//skip row
}
}
private boolean shouldWriteRow(String[] row) {
//your validation here
return true;
}
@Override
public void processEnded(ParsingContext context) {
writer.close();
}
};
}配置解析器:
public void readAndWrite(File input, File output, String encoding) {
CsvParserSettings settings = new CsvParserSettings();
//configure the parser here
//tells the parser to send each row to them custom processor, which will validate and redirect all rows to the CsvWriter
settings.setRowProcessor(createRowProcessor(output, encoding));
CsvParser parser = new CsvParser(settings);
try {
parser.parse(new InputStreamReader(new FileInputStream(input), encoding));
} catch (IOException e) {
throw new IllegalStateException("Unable to open input file " + input.getAbsolutePath(), e);
}
}为了获得更好的性能,还可以将行处理器封装在ConcurrentRowProcessor中。
settings.setRowProcessor(new ConcurrentRowProcessor(createRowProcessor(output, encoding)));这样,行的写入将在一个单独的线程中执行。
https://stackoverflow.com/questions/34269510
复制相似问题