Submitting jobs to EC2 through RemoteBatchSubmit works very well. However, I need to access a lot of data for training deep nets. How Can I access this data efficiently?
- I can' t send the data through the option InputFile, because it is too big.
- Access must be fast from the EC2 instance
- Data must be shared through instances, and is read-only
I think EFS would be great, but I don't see a way to mount it automatically to instances, or to mount the file system from inside the Notebook.
Any help or pointer will be appreciated!
This AWS support article describes the configuration necessary to mount an EFS file system in AWS Batch job containers. I found it easiest to make a modified version of the RemoteBatchSubmit submission environment CloudFormation template, which I've uploaded here. (You can use this link to open the template in the CloudFormation console.) This template has an additional parameter field at the top for entering an EFS file system ID.
After creating a stack from the template, there's one more manual step required. We have to modify the EFS file system mount targets' security group to allow inbound NFS connections from the compute instance security group created by the CloudFormation template. Take note of the mount targets' security group ID in the EFS console:
Also take note of the instance security group ID (WolframEC2SecurityGroup) created by the CloudFormation stack:
Find the EFS mount targets' security group in the VPC console and edit its inbound rules:
Add an "NFS"-type rule with the instance security group as its source:
Your jobs submitted to this environment should now be able to access the mounted EFS volume under /mnt/efs:
Let me know if this works for you or if you encounter any problems.
Thank you very much for this very complete answer.
It works perfectly!
I also appreciate that this EFS file system can store output from Mathematica when its too big to be convenient for a regular job["EvaluationResult"].
This makes RemoteBatchSubmit extremely useful, in my opinion... I will use it to speed up research, but also in a course I am building with the intent of teaching to an undergraduate/pre-university students how to use of neural nets for digital art and creative applications. I expected problems when student with basic laptops would try to "scale up" their networks for larger images and media collections. This solution is very easy to use for the students and makes it possible to avoid the python/colab pipeline.
Glad that worked for you Sebastien! The course you're working on sounds like a really interesting application of RemoteBatchSubmit; I hope you'll consider sharing your results on Community or elsewhere once it comes to fruition.
Let me know if you encounter any issues.