我正在尝试从我编写的生成器创建一个tf.data.Dataset,并遵循这个很好的答案:Split .tfrecords file into many .tfrecords files
生成器代码
def get_examples_generator(num_variants, vcf_reader):
def generator():
counter = 0
for vcf_read in vcf_reader:
is_vcf_ok = ... # checking whether this "vcf" example is ok
if is_vcf_ok and counter < num_variants:
counter += 1
# features extraction ...
# we create an example
example = make_example(img=img, label=label) # returns a SerializedExample
yield example
return generatorTFRecordsWriter使用代码
def write_sharded_tfrecords(filename, path, vcf_reader,
num_variants,
shard_len):
assert Path(path).exists(), "path does not exist"
generator = get_examples_generator(num_variants=num_variants,
vcf_reader=vcf_reader,
cfdna_bam_reader=cfdna_bam_reader)
dataset = tf.data.Dataset.from_generator(generator,
output_types=tf.string,
output_shapes=())
num_shards = int(np.ceil(num_variants/shard_len))
formatter = lambda batch_idx: f'{path}/{filename}-{batch_idx:05d}-of-' \
f'{num_shards:05d}.tfrecord'
# inspired by https://stackoverflow.com/questions/54519309/split-tfrecords-file-into-many-tfrecords-files
for i in range(num_shards):
shard_path = formatter(i)
writer = tf.data.experimental.TFRecordWriter(shard_path)
shard = dataset.shard(num_shards, index=i)
writer.write(shard)这应该是对tfrecords编写器的一种直接使用。但是,它根本不写入任何文件。有没有人知道为什么这不起作用?
发布于 2020-11-04 18:15:50
在我的函数中,我使用tf.io.TFRecordWriter调用编写器。尝试更改您的编写器,看看它是否有效:
writer = tf.io.TFRecordWriter
...作为进一步的参考,这个答案对我很有帮助:
https://stackoverflow.com/questions/64578310
复制相似问题