Message Boards Message Boards

0
|
8394 Views
|
1 Reply
|
0 Total Likes
View groups...
Share
Share this post:

RemoteBatchSubmit on AWS: restarting after failed batch job

Posted 4 years ago

I have been using using RemoteBatchSubmit on AWS. Occasionally I have to kill a job using RemoteBatchJobAbort. I have found that whenever I do this I can't get AWS to accept a follow-up job; the new job just seems to sit in the Runnable status. To over come the problem I've had to delete the batch stack and generate a new one using the Wolfram template. Maybe I haven't waited long enough before trying a new job after aborting a calculation? Is there some way to manually reset things on AWS to prepare it for a new job, or do I just need to be more patient and wait longer before submitting my next project?

POSTED BY: John Snyder

Hi John, there shouldn't be any manual action needed between submitting jobs.

How long have you tried waiting with a job in the Runnable status? I've observed that the AWS Batch scheduler (on Amazon's side) can be a bit unpredictable in its latency. I've seen jobs occasionally take up to 20-30 minutes to transition from Runnable to Starting. If you haven't tried waiting that long already, I suggest doing so once to see if that's the problem.

What are the vCPU counts of the two jobs? If the second's is greater than the first's, AWS Batch may need to launch a new instance for the second job instead of reusing the first job's instance. If the sum of the vCPU counts of the two instances is greater than either the stack's vCPU limit or your account quota, AWS Batch will have to terminate the first instance before the second can be launched. This can delay things, as AWS Batch tends to wait 10 minutes or so after a job ends before it terminates the host instance.

After terminating a job and submitting a new one, you could check the old job's status to confirm that it actually has transitioned to Failed - this has to happen before the new job can get scheduled to an instance. You can also use the AWS Batch console to view status and various properties of jobs and job queues.

POSTED BY: Jesse Friedman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract