Distributed training


(chesterkuo) #1

Hi there

I had finished the distributed training across multiple PC with multiple worker , each worker have their own checkpoint folder specific as well, question i’m not sure is , how to export model for inference with cluster case ??

Check each worker checkpoint file and see which one have latest checkpoint file and export model ??? any suggestion ?