Message Boards Message Boards

0
|
11003 Views
|
4 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Using VCPUCount in AWS RemoteBatchSubmit

Posted 4 years ago

I've been using RemoteBatchSubmit on AWS and I am very happy with this new feature. What I don't understand is my jobs seem to stick in Runnable if I request VCPUCount->96. I think the default permissions don't typically allow a VCPU count this high. Do I need to request a higher limit on VCPUs from AWS?

I notice in the AWS console EC2-Dashboard under Limits you can request higher VCPU numbers. I don't understand all these different jobs well enough to know for which, if any, I should request a higher limit. For example, on page 5 of the Dashboard under "Running On-Demand All Standard (A, C, D, H, I, M, R, T, Z) Instances" I can request a higher VPCU limit. Is this what I need to do, or is there some other reason high VCPU count jobs seems to stick in Runnable? In general, how can I get VCPUCount->96 jobs to run on AWS?

POSTED BY: John Snyder
4 Replies

Hi John, I think you're likely on the right track looking at EC2 quotas. Assuming you left the "Available instance types" setting in the CloudFormation template at the default value "c5, m5, r5, p3", the "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) Instances" quota you found is indeed what will limit the number of concurrent instances (measured in terms of vCPUs) that can run out of the c5, m5, and r5 instance type families. (p3 is for GPU computation and has its own quota.) If that quota setting is below 96, you won't be able to start a 96-core instance ([c5,m5,r5].24xlarge types), so your "VCPUCount" -> 96 jobs won't get launched.

You can request a quota increase in the AWS console on the page for that quota (direct link). In my experience AWS processes quota increase requests very quickly, often within minutes - I think the process is partially automated.

Let me know if this doesn't solve your problem.

POSTED BY: Jesse Friedman
Posted 4 years ago

Thanks Jesse. I submitted a request and Amazon increased my maximum vCPUs to 164. Now I can run jobs in the Wolfram batch stack using 96 vCPUs. But this has raised yet another question--can I use all of the 164 vCPUs Amazon has allotted to me? I tried rebuilding the batch stack setting the maximum number of vCPUs in the Wolfram template to 164. Unfortunately, I found that a test job using 128 vCPUs would not move out of a Runnable status even after an hour, so I killed the job. Is it possible for me to use more than 96 vCPUs, or does something in Wolfram's setup prevent the use of more than 96 vCPUs in any event? If it is possible to use more than 96 vCPUs, how do I setup the batch stack to allow it? Thanks!

POSTED BY: John Snyder
POSTED BY: Jesse Friedman

This is my first try to run a code on AWS.
I have a job that executes locally is the following code:

deviceS=Flatten@ParallelTable[dection@deviceS[[i]], {i, 1, Length@deviceS}];

I rewrite it in the following way, hoping to execute it on AWS:

deviceS=RemoteBatchMapSubmit[env, 
  Flatten@ParallelTable[dection@deviceS[[i]], {i, 1,  Length@deviceS}], 
  RemoteProviderSettings -> <|"VCPUCount" -> 8, 
    "Memory" -> Quantity[32, "Gibibytes"]|>, 
  LicensingSettings -> <|Method -> "OnDemand"|>];

No response for a long time. Can you help me see what is wrong? Besides, my original data (deviceS) is a huge file (I save it as mx file). Should it be better to upload to AWS first?

POSTED BY: Tsai Ming-Chou
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract