I expected the training process to run with DeepSpeed in the mix as it was doing when it DS wasn't called.