我正在使用扫描仪读取2个文本文件(可能包含重复文件),并将它们写入arraylist。我正在比较这两个数组列表,以找出差异。当我打印输出时,我可以看到有什么不同,但我不知道哪条记录来自哪个文件(文本文件名)
text1.txt中的内容
TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,VEM,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,ZER,20190703113154,20190601000000,20190701000000,text2.txt中的内容
TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,VEM,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,ZER,20190703113154,20190601000000,20190701000000,代码:
Scanner prodScanner = new Scanner(prodFile);
while (prodScanner.hasNextLine()) {
String currentRecord = prodScanner.nextLine().trim();
if (currentRecord.length() > 0) {
prodRecordsFromStatement.add(currentRecord);
}
}
Scanner nonProdScanner = new Scanner(nonProdFile);
while (nonProdScanner.hasNextLine()) {
String currentRecord = nonProdScanner.nextLine().trim();
if (currentRecord.length() > 0) {
nonProdRecordsFromStatement.add(currentRecord);
}
}
Collection<String> result = new ArrayList<>(CollectionUtils.disjunction(prodRecordsFromStatement, nonProdRecordsFromStatement));
List<String> resultList = new ArrayList<>(result);
Collections.sort(resultList);实际结果:
TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,预期结果:我希望显示文件/列表的名称以便于理解
text2.txt,TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,发布于 2019-08-08 06:43:37
遍历resultList检查,查看当前项是否也在prodRecordsFromStatement中。
如果是,则它来自文件1,否则它来自文件2。
发布于 2019-08-08 06:53:47
您的解决方案需要有多高的性能?如果性能不是非常关键,并且您的列表也不长,那么您可以切换到使用subtract而不是析取。
例如。
Collection<String> resultProdRecords = new ArrayList<>(CollectionUtils.subtract(prodRecordsFromStatement, nonProdRecordsFromStatement));
Collection<String> resultNonProdRecords = new ArrayList<>(CollectionUtils.subtract(prodRecordsFromStatement, nonProdRecordsFromStatement));resultProdRecords将包含prodRecordsFromStatement中不在nonProdRecordFromStatement中的所有行。
resultNonProdRecords将包含nonProdRecordFromStatement中不在prodRecordsFromStatement中的所有行。
https://stackoverflow.com/questions/57402645
复制相似问题