July 29, 2019

36 Ways Pytorch Lightning Can Supercharge Your AI Research

William Falcon
Come at me AGI

AGI is not going to solve itself ( deep down you know we’re the AGI of another AI 🤯).

But let’s say it did…

Imagine flipping open the lid to your laptop to find an algorithm like this written for you.

def AGI(data):
   data = clean(data)
   agi = magic(data)
   return agi

Ummm Ok 🤔. Let’s see where this goes. You convince your research group you need to experiment with this for a bit.

But obvs this won’t run as written. First, we need a training loop:

for epoch in range(10):
   for batch in data:
       agi = AGI(batch)
       agi.backward()
       ...

Ok, now we’re kind of training. But we still need to add a validation loop…

def validate(dataset):
  # more magic

Dope. But LOL AGI on a CPU?

You wish.

Let’s run this on multiple GPUs… But wait, you’ve also read that 16-bit can speed up your training. OMG, but there are like 3 ways of doing GPU distributed training.

So you spend the next week coding this up. But it’s still slow, so you decide to use a compute cluster. Aaaand now things are getting a bit more complicated.

Image for post
Sad times

Meanwhile, your AGI has a bug, but you’re unsure whether it’s your GPU distribution code, or how you load your data, or any of the other million things you could have coded wrong.

You decide you don’t quite want to deal with all the training details and you try Keras, but it doesn’t let you implement the AGI function well because you need more control over the training. Fast.ai is also out of the question because this isn’t an off-the-shelf algorithm.

Well, that sucks, now you have code this all up yourself…

Nope.

Pytorch Lightning

Image for post
How you feel when running a single model on 200 GPUs

Pytorch Lightning has all of this already coded for you, including tests to guarantee that there are no bugs in that part of the program.

This means you can focus on the core of your research and not worry about all the tedious engineering details which would be fun to deal with if you didn’t have to focus on the core research idea.

Here’s a clear graphic that shows what’s automated for you. The gray parts are automated and controlled via trainer flags. You define the blue parts as arbitrarily complex as you want, using whatever underlying models you want (your own, pretrained stuff, fast.ai architectures, etc…).

Image for post
You own the blue. Lightning owns the rest.

Lightning Model

The core of Lightning are two things, a LightningModel, and a Trainer. The LightningModel is where you’ll spend 90% of your time.

Notice you’re defining what happens in the training loop

for epoch in range(10):
   for batch in data:
     # training_step above is what happens here
     # lightning handles the rest (backward, gradient clip, etc...)

And the same thing for validation

for val_batch in data:
   # validation_step above is what happens here
   # with no grad, eval, etc... all handled for you automatically

These two functions above can get very complicated. In fact, you can define a full transformer, seq-2-seq, fairseq model, in these two functions.

Trainer

Image for post

The trainer handles all the core logic for the things you don’t want to code, but you need guarantees that it was done correctly and using the latest best practices.

By just setting a few flags, you can train your AGI on CPU, multiple-GPUs or multiple nodes on a cluster. Not just that, but you can also enable gradient clipping, accumulated gradients, 16-bit precision, auto-cluster saving, hyperparameter snapshot, Tensorboard visualization, etc…

You get the idea.

Not only do you get the latest and greatest tricks for training ML/AI systems, but you are guaranteed they work and are properly tested.

This means you only have to worry about getting your part — the new algorithm — correct. How you load the data and what you do in the core part of the training is up to you.

So, what about the 36 ways Pytorch Lightning can help? There are about 36 things you’d normally implement on your own that might have bugs. Lightning does it AND tests it, so you don’t have to!

Check out the full list here

Congratulations. You can use all that free time you just got back to hack up that side project you’re working on (probably a chatbot for your puppy, or uber for yoga pants).

Image for post

Written by

William Falcon

PyTorch Lightning Creator • Co-founder and CEO Grid AI • PhD Student, AI (NYU, Facebook AI research).