arrow-left arrow-right brightness-2 chevron-left chevron-right circle-half-full dots-horizontal facebook-box facebook loader magnify menu-down rss-box star twitter-box twitter white-balance-sunny window-close
Cuda on WSL2 for Deep Learning - First Impressions and Benchmarks
5 min read

Cuda on WSL2 for Deep Learning - First Impressions and Benchmarks

Cuda on WSL2 for Deep Learning - First Impressions and Benchmarks

TLDR: It's a little slower than native Ubuntu, but the future is bright!

Not going to lie, Microsoft has been doing some good things in the software development community. I love coding in Visual Studio Code and ONNX has been great if you want to optimize your deep learning models for production. WSL2 allowing you to have access to an entire Linux Kernel is exactly what I've been wanting, but the lack of CUDA support means it was a non-starter for me as an A.I. Engineer.

As an A.I. Engineer and a Content Creator, I need Windows for the tools to create my content, but I need Linux to easily run and train my A.I. software projects. I have a dual boot setup on my machine, but it's not ideal. I hate that if I'm training a model I can't access my creation tools. So I have to work synchronously instead of asynchronously.

I know most deep learning libraries support Windows but the experience to get things working, especially open source A.I. software, was always a headache. I know I can use something like qemu for running Windows software on Linux, but that requires me to isolate an entire GPU to the VM, causing my Linux instance to not have access to it.

Here come Microsoft and Nvidia with CUDA WSL2 support! The promise of all Linux tools running natively on Windows would be a dream for my workflow. I immediately jumped on it when they released a preview version. In this post, I will write about my first impressions as well as some benchmarks!

Setting up CUDA on WSL2

Setting up Cuda on WSL2 was super easy for me. Nvidia has really good docs explaining the steps you need to take. I was pleasantly surprised that I did not run into a single error! I can't remember the last time that happened when setting up software that's still in beta.

Setting up my Machine Learning Tools

As a A.I. Engineer that specifically spends a lot of time doing deep learning, there are a few tools that I need to make my developing experience much better.

  1. Docker with CUDA support (nvidia-docker)
  2. PyTorch as my deep learning framework of choice
  3. Horovod for distributed training

I've not tried WSL2 since I dual boot into Linux, so I was pleasantly surprised that I can easily download and install my tools like I was on a normal Ubuntu machine. I had no issue installing each of these packages. It was an Ubuntu experience as you would expect it.

Training Models

OK now, this is the real test. First I will talk about my experience, and then I'll present some benchmarks to compare Cuda on WSL2 and bare-metal Linux.

I think a common workflow when training deep learning models regardless if you have your own hardware or if you're training on the cloud is to have a separate disk for all of your data and a disk for the actual operating system. WSL2 will automatically detect and mount any disk that Windows 10 recognizes so that was cool; but I ran into issues with file permissions on my mounted data drive.

NOTE: The issue is only on mounted drives and works fine if you do everything within your WSL2 file system.

So my training script would error out on random data files due to the file permissions being restricted. So I read the WSL docs and it states that...

When accessing Windows files from WSL the file permissions are either calculated from Windows permissions, or are read from metadata that has been added to the file by WSL. This metadata is not enabled by default.

Ok, so WSL2 calculates the file permissions, and sometimes it screws up I guess, I just need to enable this metadata thingie to get it working right? Well sorta... So I added the metadata and then did chmod -R 777 to all of my data files as a quick and dirty way to just free up the permissions so I can continue training! Well, it worked for a bit... then it broke again with the same permissions error! So I looked at the permissions and it somehow reverted my changes and went back to restricted access. The kicker is if I check file permissions multiple times it would then revert to my chmod permissions. So it was randomly changing permissions and the only way I would know it is if I checked the permissions using ls -l. I discovered a weird WSL2 quantum phenomenon that I'll coin... The Schrödinger's file permissions. The Issue stopped when I used chmod 700 to give full read, write, and execution permissions to only the WSL2 user and not to everybody and their mom. This somehow fixed the Schrödinger's file permissions issue so I just went on with life.

I started training with no issues after that! Everything looked good, the model loss was going down and nothing looked out of the ordinary. I decided to do some benchmarking to compare deep learning training performance of Ubuntu vs WSL2 Ubuntu vs Windows 10.

Benchmarks - Ubuntu V.S. WSL2 V.S. Windows 10

To benchmark, I used the MNIST script from the Pytorch Example Repo. I modified the script to make the network much bigger to get a more accurate reading for larger models.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 128, 3, 1)
        self.conv2 = nn.Conv2d(128, 128, 3, 1)
        self.conv3 = nn.Conv2d(128, 128, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(15488, 15488//2)
        self.fc2 = nn.Linear(15488//2, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = self.conv3(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
Network used to benchmark

My machine specs are...

  • Intel i9 10920X - 12 cores, 24 threads
  • Nvidia RTX 2080 TI - 11GB VRAM

I ran 3 tests on...

  • Ubuntu 20.04
  • WSL2 Ubuntu 20.04
  • Windows 10

I used a batch size of 512 and ran for 14 epochs and ran on FP32 precision. Below are the results…

So the results aren't too bad honestly! WSL2 with CUDA support takes 18% longer than native Ubuntu to train an MNIST model on my Nvidia RTX 2080 Ti. CUDA support with WSL2 is still in early preview mode and I'm hopeful that the engineers and researchers over and Microsoft and Nvidia will eventually reach a point where it gets close to Ubuntu Performance.

For some people, taking 18% longer to train models may be a non-starter, but for me, I can take the small performance hit if it means I can asynchronously work on training deep learning models as well as use my Windows compatible software tools to create content. I'm going to stick with Windows 10 and WSL2 as my daily driver for a while and see how it goes!