Apache Hudi写出每个拼图文件,如下所示:
0743209d-51cb-4233-a7cd-5bb712fba1ff-0_21-64-5300_20211117172738.parquet我正在尝试理解文件的每个部分代表什么。这是我目前的理解,但我希望任何可能知道的人都能确认和澄清。
0743209d-51cb-4233-a7cd-5bb712fba1ff = file group/file name
-0 = file chunk
20211117172738 = timestamp of the batch我不确定下面的部分代表什么:
21-64-5300=?发布于 2021-11-18 23:12:36
这是我的发现:
hudi file format -- 0743209d-51cb-4233-a7cd-5bb712fba1ff-0_21-64-5300_20211117172738.parquet
first part is a unique identifier of the file group.
next is write token.
and then the commit time.
Write token is to assist with detecting spark write failures.
public static String makeDataFileName(String instantTime, String writeToken, String fileId, String fileExtension) {
return String.format("%s_%s_%s%s", fileId, writeToken, instantTime, fileExtension);
}https://stackoverflow.com/questions/70010724
复制相似问题