Thanks for this helpful discussion.
I could use some advice on how to use AWS from Wolfram more effectively. I've managed with some struggle to create the necessary permissions and when I use the (updated) CloudFormation link referenced in the documentation to obtain a vanilla environment, I can do modest computations just fine. That is, I was delighted when RemoteBatchSubmit[env, blahblah] returned EvaluationResults in reasonable time. But ...
I now need to do some neural network training and evaluation that essentially requires a GPU. So, my strategy was to put just "p3" in the Available Instance Types field of the CloudFormation form. I did so because p3 instances appear to have GPUs available to them whereas some of the other default Available Instance Types do not. I also set the Default GPU Environment field in the CloudFormation form to 1. Here's the RemoteBatchSubmissionEnvironment that was returned to me.
env2=RemoteBatchSubmissionEnvironment["AWSBatch", <|
"JobQueue" ->
"arn:aws:batch:us-east-1:347566773302:job-queue/WolframJobQueue-\
d3B4hun0Mh0A8IKa",
"JobDefinition" ->
"arn:aws:batch:us-east-1:347566773302:job-definition/\
WolframJobDefinition-cf703d5831a3a0c:1",
"IOBucket" -> "gpuneeded-wolframjobdatabucket-1nr93xjmeu4g0"|>]
I then did what others have done to assess use of AWS. I ran the example in the Wolfram Documentation:
job=RemoteBatchSubmit[env2,
nt = NetTrain[NetModel["LeNet"], "MNIST", TargetDevice -> "GPU"],
RemoteProviderSettings -> <|"GPUCount" -> 1|>]
It's been about half an hour and perhaps I am too impatient, but I am still getting the sad "runnable" when I evaluate the following code:
job["JobStatus"]
There is no indication from AWS as to when (if ever) my submission will actually evaluate. So, the basic question is, Am I doing something wrong?
Things that perhaps I have screwed up.
- Do I need a paid account? Would a paid account help accelerate the process?
- Did I make a mistake by limiting myself to p3 machines in the CloudFormation template? What would be better?
- Other
A further note: I think there is a challenge for people (like me) who use Wolfram precisely because they are NOT computer scientists but find Wolfram both extraordinarily easy to use and exceedingly well documented (with an emphasis on examples). Occasionally, though, we need to leave the friendly, well-documented Wolfram Universe and move to other terrains. And often, I find, the documentation there is either poor or makes a huge number of unfounded assumptions about the knowledge and vocabulary of the user. It's often non-conceptual recipes that fail to generalize or verbal descriptions without any examples. While this may be fine when, for example, computer scientists use AWS, it is a real challenge when a person who has lived in the Wolfram Universe needs to go outside it. So, I very much appreciate Jesse Friedman's efforts to start bridging the gap.
Oh, and one more thing. I wonder if some amount of the desire to use AWS stems from the fact that -- still -- Mac users can not (to my knowledge) easily use a GPU to perform operations like NetTrain. I have been told for years that this is some limitation due to MXNet and that it might (or might not) go away. My feeling is that in 2022, Wolfram needs to figure out a way to unleash GPU performance for its many users doing Machine Learning on a Mac. (Or, if it can be done, let users know how to do it).
All help appreciated!