Hi John, I think you're likely on the right track looking at EC2 quotas. Assuming you left the "Available instance types" setting in the CloudFormation template at the default value "c5, m5, r5, p3
", the "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) Instances" quota you found is indeed what will limit the number of concurrent instances (measured in terms of vCPUs) that can run out of the c5
, m5
, and r5
instance type families. (p3
is for GPU computation and has its own quota.) If that quota setting is below 96, you won't be able to start a 96-core instance ([c5,m5,r5].24xlarge
types), so your "VCPUCount" -> 96
jobs won't get launched.
You can request a quota increase in the AWS console on the page for that quota (direct link). In my experience AWS processes quota increase requests very quickly, often within minutes - I think the process is partially automated.
Let me know if this doesn't solve your problem.