Group Abstract Group Abstract

Message Boards Message Boards

0
|
12.1K Views
|
4 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Using VCPUCount in AWS RemoteBatchSubmit

Posted 4 years ago

I've been using RemoteBatchSubmit on AWS and I am very happy with this new feature. What I don't understand is my jobs seem to stick in Runnable if I request VCPUCount->96. I think the default permissions don't typically allow a VCPU count this high. Do I need to request a higher limit on VCPUs from AWS?

I notice in the AWS console EC2-Dashboard under Limits you can request higher VCPU numbers. I don't understand all these different jobs well enough to know for which, if any, I should request a higher limit. For example, on page 5 of the Dashboard under "Running On-Demand All Standard (A, C, D, H, I, M, R, T, Z) Instances" I can request a higher VPCU limit. Is this what I need to do, or is there some other reason high VCPU count jobs seems to stick in Runnable? In general, how can I get VCPUCount->96 jobs to run on AWS?

POSTED BY: John Snyder
4 Replies
Posted 4 years ago
POSTED BY: John Snyder

A single batch job (submitted with RemoteBatchSubmit) runs on a single compute instance, so it's limited to the vCPU count of the largest available instance type. For the c5, r5, and m5 families, this is currently the 24xlarge-size instance types with 96 vCPUs.* To use more vCPUs concurrently, you can submit multiple single batch jobs to run at the same time. If the "Maximum vCPUs" template parameter is >= 192 and your account vCPU limit is high enough, then you can submit two jobs with "VCPUCount" -> 96 and they will run concurrently on two 24xlarge-size instances.

Array batch jobs (submitted with RemoteBatchMapSubmit), on the other hand, can take advantage of multiple running instances simultaneously by splitting a computation into several independent "child" jobs, in a similar manner to how ParallelMap distributes a series of computations across multiple processor cores. We've recently been running some research experiments with array batch jobs using around 900 active cores. Like ParallelMap, RemoteBatchMapSubmit effectively requires you to structure your program as a single, large Map operation.

* The x1 and x1e families support up to 128 vCPUs, but these are specialty instance types with very large amounts of memory and so have a much higher per-vCPU cost than the more general-purpose c5, r5, and m5 families. You can see the full list of instance types (not all of which are usable with AWS Batch) in the Instance Types section of the EC2 console or on the EC2 pricing page.

POSTED BY: Jesse Friedman

This is my first try to run a code on AWS.
I have a job that executes locally is the following code:

deviceS=Flatten@ParallelTable[dection@deviceS[[i]], {i, 1, Length@deviceS}];

I rewrite it in the following way, hoping to execute it on AWS:

deviceS=RemoteBatchMapSubmit[env, 
  Flatten@ParallelTable[dection@deviceS[[i]], {i, 1,  Length@deviceS}], 
  RemoteProviderSettings -> <|"VCPUCount" -> 8, 
    "Memory" -> Quantity[32, "Gibibytes"]|>, 
  LicensingSettings -> <|Method -> "OnDemand"|>];

No response for a long time. Can you help me see what is wrong? Besides, my original data (deviceS) is a huge file (I save it as mx file). Should it be better to upload to AWS first?

POSTED BY: Tsai Ming-Chou
POSTED BY: Jesse Friedman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard