Group Abstract

Message Boards

WOLFRAM COMMUNITY

12.1K Views

4 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Using VCPUCount in AWS RemoteBatchSubmit

John Snyder

Posted 4 years ago

I've been using RemoteBatchSubmit on AWS and I am very happy with this new feature. What I don't understand is my jobs seem to stick in Runnable if I request VCPUCount->96. I think the default permissions don't typically allow a VCPU count this high. Do I need to request a higher limit on VCPUs from AWS? I notice in the AWS console EC2-Dashboard under Limits you can request higher VCPU numbers. I don't understand all these different jobs well enough to know for which, if any, I should request a higher limit. For example, on page 5 of the Dashboard under "Running On-Demand All Standard (A, C, D, H, I, M, R, T, Z) Instances" I can request a higher VPCU limit. Is this what I need to do, or is there some other reason high VCPU count jobs seems to stick in Runnable? In general, how can I get VCPUCount->96 jobs to run on AWS?

POSTED BY: John Snyder

4 Replies

Sort By:

John Snyder

Posted 4 years ago

POSTED BY: John Snyder

Jesse Friedman

Jesse Friedman, Wolfram Research

Posted 4 years ago

A single batch job (submitted with RemoteBatchSubmit) runs on a single compute instance, so it's limited to the vCPU count of the largest available instance type. For the `c5`, `r5`, and `m5` families, this is currently the `24xlarge`-size instance types with 96 vCPUs.* To use more vCPUs concurrently, you can submit multiple single batch jobs to run at the same time. If the "Maximum vCPUs" template parameter is >= 192 and your account vCPU limit is high enough, then you can submit two jobs with `"VCPUCount" -> 96` and they will run concurrently on two `24xlarge`-size instances. Array batch jobs (submitted with RemoteBatchMapSubmit), on the other hand, can take advantage of multiple running instances simultaneously by splitting a computation into several independent "child" jobs, in a similar manner to how ParallelMap distributes a series of computations across multiple processor cores. We've recently been running some research experiments with array batch jobs using around 900 active cores. Like ParallelMap, RemoteBatchMapSubmit effectively requires you to structure your program as a single, large Map operation. * The `x1` and `x1e` families support up to 128 vCPUs, but these are specialty instance types with very large amounts of memory and so have a much higher per-vCPU cost than the more general-purpose `c5`, `r5`, and `m5` families. You can see the full list of instance types (not all of which are usable with AWS Batch) in the Instance Types section of the EC2 console or on the EC2 pricing page.

POSTED BY: Jesse Friedman

Tsai Ming-Chou

Tsai Ming-Chou, National Defense Medical Center

Posted 3 years ago

This is my first try to run a code on AWS. I have a job that executes locally is the following code: deviceS=Flatten@ParallelTable[dection@deviceS[[i]], {i, 1, Length@deviceS}]; I rewrite it in the following way, hoping to execute it on AWS: deviceS=RemoteBatchMapSubmit[env, Flatten@ParallelTable[dection@deviceS[[i]], {i, 1, Length@deviceS}], RemoteProviderSettings -> <\|"VCPUCount" -> 8, "Memory" -> Quantity[32, "Gibibytes"]\|>, LicensingSettings -> <\|Method -> "OnDemand"\|>]; No response for a long time. Can you help me see what is wrong? Besides, my original data (deviceS) is a huge file (I save it as mx file). Should it be better to upload to AWS first?

POSTED BY: Tsai Ming-Chou

Jesse Friedman

Jesse Friedman, Wolfram Research

Posted 4 years ago

POSTED BY: Jesse Friedman

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback