Message Boards Message Boards

RemoteBatchSubmit to AWS: "The specified job definition ... does not exist"

Posted 3 years ago

EDIT: Solved. Switching locations on AWS, then creating a new BatchComputeEnvironment, solved the issue.

Following the workflow https://reference.wolfram.com/language/workflow/SetUpTheAWSBatchComputationProvider.html, I have a working BatchComputeEnvironment, and have authenticated the AWS service connection using my local credentials files. But when I run a RemoteBatchSubmit command of the form

RemoteBatchSubmit[RemoteBatchSubmissionEnvironment["AWSBatch", <|"JobQueue" -> "...", 
  "JobDefinition" -> "...", "IOBucket" -> "..."|>], 2 + 2]

, it gives the error:

RemoteBatchSubmit::awsbatch-missingdefinition: The specified job definition arn:aws:batch:us-east-1:[...] does not exist.

Other AWS calling functionality is working fine, such as:

ServiceConnect["AWS"]["GetService", 
  "Name" -> "Translate"]["TranslateText", "Text" -> "今日は良い一日だった", 
 "SourceLanguageCode" -> "auto", "TargetLanguageCode" -> "en"]

I've triple-checked the details are correct, and match what AWS shows me and what the workflow told me to do. And I can even submit jobs manually using the same job definition at console.aws.amazon.com/batch. Hence I have no clue about the error. Essentially, Mathematica is failing to recognise the existence of a job definition which exists. Any ideas?

POSTED BY: Daniel Martin
8 Replies
Posted 3 years ago

Thanks Jesse. Indeed. Glad to document my stumble in order to flesh out your UX.

POSTED BY: Daniel Martin

Hi Daniel, it sounds like you got this working by creating a submission environment CloudFormation stack in a different region. For future reference, the region configured in the AWS service connection on your local machine (if you're using credential files, then that in the ~/.aws/config file, as described in this workflow) must match the region that the CloudFormation stack was created in. If a region isn't specified in ~/.aws/config, the default used is us-east-1.

I will make a note to improve the error message you encountered - it might be more clear if it said something like "The specified job definition arn:aws:batch:us-east-1:[...] does not exist in the XX-YYY-Z region."

POSTED BY: Jesse Friedman

What follows is a rather long sequence of getting a problem to work. Sorry for the long set of replies. Hopefully, someone will find this useful who encountered the same problem:

ServiceExecute::aws-requesterror: The DescribeJobDefinitions request failed with the following client error: no AWS credentials could be found using the specified authentication configuration.

I just set up an AWS account. I've run "aws configure" (on a mac, and not as sudo). The contents of .aws/* looks fine (region has two lines: default] and eu-1). I've tried the workflows in [set up aws batch computation provider and Authenticate with Amazon Web Services.

I am trying to hunt down the problem, but not even sure where to start looking. If you could just point me in the right direction....

POSTED BY: W. Craig Carter

I was able to get AWS to work. 1) I deleted all ssh-key pairs and created new ones. 2) re-ran aws configure. 3) Created a new service connection aws = ServiceConnect["AWS", "New"]

But, following the instructions in the workflow:

ec2 = aws["GetService", "Name" -> "EC2"]

and

ec2["DescribeInstanceTypes", "InstanceTypes" -> {"c5.12xlarge"}]

came back with

The DescribeInstanceTypes request failed with the following error from the server: "You are not authorized to perform this operation."

I'll update again if I make any progress.

POSTED BY: W. Craig Carter

I was able to fix the last problem by creating a new AWS group, adding myself as a user to that group, and giving the group Administration privileges. The AWS interface isn't very intuitive.

However, following the next workflow for a remote batch submit:

env = RemoteBatchSubmissionEnvironment[(*...copied from AWS...*)]

came back with

ServiceConnect::multser: One service was chosen from multiple AWS services.

and

job = RemoteBatchSubmit[env, 2 + 2]

came back with

ServiceExecute::aws-requesterror: The DescribeJobDefinitions request failed with the following client error: no AWS credentials could be found using the specified authentication configuration.

So, it looks like I will need to figure out how to delete multiple services. I'll report back if successful.

POSTED BY: W. Craig Carter

Using "ServiceObject" -> "New" inside the association of the second argument of RemoteBatchSubmissionEnvironment:

RemoteBatchSubmissionEnvironment["AWSBatch", <|
  "JobQueue" -> 
   "xxx", 
  "JobDefinition" -> 
   "xxx-us-east-xxx\
WolframJobDefinition-xxx", 
  "IOBucket" -> 
   "xxx",
  "ServiceObject" -> "New"|>]

works.

However,

job = RemoteBatchSubmit[env, 2 + 2]

fails with

RemoteBatchSubmit::awsbatch-missingdefinition: The specified job definition xxx:us-east-xxx does not exist.

Progress. Will report back

POSTED BY: W. Craig Carter

I created a new stack on AWS. It looks like the template is inserting "us-east" no matter what I do. Ah, there is a pull down menu on the AWS dashboard that asks you to specify your region:

pull down tab at right

However, trying to create a batch with the template fails with an error:

ROLLBACK COMPLETE

I'm stuck now. Not sure what to do next.

POSTED BY: W. Craig Carter
Posted 3 years ago

Switching locations on AWS solved the issue.

POSTED BY: Daniel Martin
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract