-
Notifications
You must be signed in to change notification settings - Fork 0
1.1 MCMSEM in the cloud
Getting all requirements for MCMSEM to work properly on a cloud GPU can be tricky, this page is an attempt to make the process easier, but we cannot guarantee that this will always work 100%. Linux experience will definitely help.
Typically cloud providers will have per-second billing. This means that every second with an active cloud instance will cost money. This not only includes the time spent actually running MCMSEM, but also the time installing everything. Be prepared for this.
When your analysis is done, or you are shutting down for the day, or for whatever reason you won't be using your VM for a while, make sure to stop it from your VM instances console https://console.cloud.google.com/compute/instances otherwise you will pay for server time you do not use.
If you are done with your analyses and won't be needing your VM again for a longer time, make sure to terminate it completely, and make sure all associated disks are removed (as these are billed separately). If you are planning on using MCMSEM again in the future I recommend you make a custom machine image from the VM, as this will save you from having to set up the VM again, and storage of a machine image is significantly cheaper than storage of a disk.
In short, this will cost money, and we are not responsible for any expenses as a result of running MCMSEM, following this manual, or any other reason.
There are a lot of options for cloud service providers, and the VM types they offer are subject to change. Ofcourse, you should primarily select the cloud provider based on your budget and/or institutional requirements. If you have a choice between multiple providers, find the one that provides single VM instances with 1 GPU with as much VRAM as possible (or as much as you can afford). Additionally, be on the lookout for cloud providers that offer pre-compiled machine images with the correct CUDA and cudNN versions.
At the time of writing Google Cloud is the recommended provider as they have servers in multiple regions (USA/EU/Asia),
and have an instance type with 1 GPU with 40GB RAM (a2-highgpu-1g
), and may soon have 80GB GPUs available (a2-ultragpu-1g
),
and have a machine image that fits the current version of R torch. As a bonus Google Cloud is relatively easy to set up and use, compared to some others.
- First create or set up a Google cloud account with billing information
- Create a Google cloud project
- This should pop up once you visit https://console.cloud.google.com, or you can find projects in the top left hand corner and create one there
- Next request a quota increase from Google for the desired node types in the desired region (this may take a few days)
- You will get a notification linking you to the correct page when you try to create a node outside your quota
- Once this is approved (or to get to the request page), go to your VM instances: https://console.cloud.google.com/compute/instances
- Click
CREATE INSTANCE
- In Machine configuration, select
GPU
, and select the preferred GPU and machine type. - I used and recommend NVIDIA A100 40GB (
a2-highgpu-1g
)- Important: Multiple GPUs will not benefit you, don't pay for stuff you won't use!
- Under Boot Disk, select CHANGE
- Under Operating System choose
Deep Learning on Linux
- Then, under Version choose
Debian 10 based Deep Learning VM with M97
*- At the time of writing this is the preferred image, but this is likely to change in the future... The point is that you choose an image CUDA 11.3 and cudNN 8.4 (or whatever R torch requires when you are reading this). This saves you from having to install CUDA and cudNN yourself, saving you a lot of time, money and headaches
- Make sure to set the disk size to an appropriate amount given your data, for an estimate, data size + 50GB should be fine
- Create the server
- Click
- In the list of your VM instances, wait until your VM is running.
- Once your VM is up and running, connect to it via the SSH button
- This connects you to your VM via SSH in your browser and has Google manage all required keys. If you really want to set this up yourself and not connect from the browser, see https://cloud.google.com/compute/docs/instances/connecting-advanced
- Once connected you can start uploading your data (as this may be slow depending on traffic) and start the installation procedure.
- First, install R 4.2.1 (or some later version), Rs own manual, at the time of writing, is a bit vague on how to do this https://cran.r-project.org/bin/linux/debian/#debian-buster-stable but the general idea is:
- Import key fingerprint
apt-key adv --keyserver keyserver.ubuntu.com --recv-key '95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7'
- Add
deb http://cloud.r-project.org/bin/linux/debian bullseye-cran40/
to/etc/apt/sources.list
sudo apt update
sudo apt install -t buster-cran40 r-base
- Import key fingerprint
- install dependencies
sudo apt install libxml2-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev r-base-dev build-essential build-essential libssl-dev libcurl4-openssl-dev libcairo2-dev libgit2-dev
- Install R packages:
sudo R
install.packages("devtools")
install.packages("torch")
library(torch)
- Check if CUDA works :
a <- torch_tensor(1, device=torch_device('cuda'))
- Odds are some of these steps will not work on the first go, your best source of information to fix this would be Google/Stackoverflow the usual suspects.
- Once that is done,
library(devtools)
and install the latest version ofMCMSEM
- Then you are good to go. I recommend making an image of your VM at this point for future use should you need to do this again at some point.
- The cost of a machine image is significantly lower than the cost of an active disk, so especially if you are done with your current analysis but are planning on using it again in the future, save a machine image before you terminate your VM and remove the disk.