January 14, 2021

Neptune-PyTorch Lightning Integration

Neptune AI Team

What will you get with this integration?

PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. With Neptune integration you can:

Note

This integration is tested with pytorch-lightning==1.0.7, and neptune-client==0.4.132.

Where to start?

To get started with this integration, follow the Quickstart below. You can also skip the basics and take a look at the advanced options.

If you want to try things out and focus only on the code you can either:

  1. Open Colab notebook (badge-link below) with quickstart code and run it as an anonymous user “neptuner” - zero setup, it just works,
  2. View quickstart code as a plain Python script on GitHub.

You can also check this public project with example experiments: PyTorch Lightning integration.

Quickstart

This quickstart will show you how to log PyTorch Lightning experiments to Neptune using NeptuneLogger (part of the pytorch-lightning library).

As a result you will have an experiment logged to Neptune. It will have train loss and epoch (visualized as charts), parameters, hardware utilization charts and experiment metadata.

           

       Run in Google Colab                

       View source on GitHub                

       See example in Neptune    

Before you start

You have Python 3.x and following libraries installed:

You also need minimal familiarity with the PyTorch Lightning. Have a look at the “Lightning in 2 steps” guide to get started.

Step 1: Import Libraries

Import necessary libraries.

import os

import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms

import pytorch_lightning as pl

Notice pytorch_lightning at the bottom.

Step 2: Define Hyper-Parameters

Define Python dictionary with hyper-parameters for model training.

PARAMS = {'max_epochs': 3,
         'learning_rate': 0.005,
         'batch_size': 32}

This dictionary will later be passed to the Neptune logger (you will see how to do it in step 4), so that you will see hyper-parameters in experiment Parameters tab.

Step 3: Define LightningModule and DataLoader

Implement minimal example of the pl.LightningModule and simple DataLoader.

# pl.LightningModule
class LitModel(pl.LightningModule):
   def __init__(self):
       super().__init__()
       self.l1 = torch.nn.Linear(28 * 28, 10)

   def forward(self, x):
       return torch.relu(self.l1(x.view(x.size(0), -1)))

   def training_step(self, batch, batch_idx):
       x, y = batch
       y_hat = self(x)
       loss = F.cross_entropy(y_hat, y)
       self.log('train_loss', loss)
       return loss

   def configure_optimizers(self):
       return torch.optim.Adam(self.parameters(), lr=PARAMS['learning_rate'])

# DataLoader
train_loader = DataLoader(MNIST(os.getcwd(), download=True, transform=transforms.ToTensor()),
                         batch_size=PARAMS['batch_size'])

Few explanations here:

self.log('train_loss', loss)

This loss will be logged to Neptune during training as a train_loss. You will see it in the Experiment’s Charts tab (as “train_loss” chart) and Logs tab (as raw numeric values).

Step 4: Create NeptuneLogger

Instantiate NeptuneLogger with necessary parameters.

from pytorch_lightning.loggers.neptune import NeptuneLogger

neptune_logger = NeptuneLogger(
   api_key="ANONYMOUS",
   project_name="shared/pytorch-lightning-integration",
   params=PARAMS)

NeptuneLogger is an object that integrates Neptune with PyTorch Lightning allowing you to track experiments. It’s a part of the lightning library. In this minimalist example we use public user “neptuner”, who has public token: “ANONYMOUS”.

Tip

You can also use your API token. Read more about how to securely set Neptune API token.

Step 5: Pass NeptuneLogger to the Trainer

Pass instantiated NeptuneLogger to the pl.Trainer.

trainer = pl.Trainer(max_epochs=PARAMS['max_epochs'],
                    logger=neptune_logger)

Simply pass neptune_logger to the Trainer, so that lightning will use this logger. Notice, that max_epochs is from the PARAMS dictionary.

Step 6: Run experiment

Fit model to the data.

model = LitModel()

trainer.fit(model, train_loader)

At this point you are all set to fit the model. Neptune logger will collect metrics and show them in the UI.

Explore Results

You just learned how to start logging PyTorch Lightning experiments to Neptune, by using Neptune logger which is part of the lightning library.

Above training is logged to Neptune in near real-time. Click on the link that was outputted to the console or go here to explore an experiment similar to yours. In particular check:

  1. metrics,
  2. logged parameters,
  3. hardware usage statistics,
  4. metadata information including git summary info.

Check this experiment here or view quickstart code as a plain Python script on GitHub.

           

       Run in Google Colab                

       View source on GitHub                

       See example in Neptune    

PyTorchLightning neptune.ai integration

Advanced options

To learn more about advanced options that Neptune logger has to offer, follow sections below as each describes one functionality.

If you want to try things out and focus only on the code you can either:

  1. Open Colab notebook (badge-link below) and run advanced example as a “neptuner” user - zero setup, it just works,
  2. View advanced example code as a plain Python script on GitHub.

You can also check this public project with example experiments: PyTorch Lightning integration.

           

       Run in Google Colab                

       View source on GitHub                

       See example in Neptune    

Before you start

In addition to the contents of the “Before you start” section in Quickstart, you also need to have scikit-learn and scikit-plot installed.

pip install scikit-learn==0.23.2 scikit-plot==0.3.7

Check scikit-learn installation guide or scikit-plot github project for more info.

Jump to your favorite option

Advanced NeptuneLogger options

Create NeptuneLogger with advanced parameters.

from pytorch_lightning.loggers.neptune import NeptuneLogger

ALL_PARAMS = {...}

neptune_logger = NeptuneLogger(
   api_key="ANONYMOUS",
   project_name="shared/pytorch-lightning-integration",
   close_after_fit=False,
   experiment_name="train-on-MNIST",
   params=ALL_PARAMS,
   tags=['1.x', 'advanced'],
)

In the NeptuneLogger - besides required api_key and project_name, you can specify other options, notably:

Tip

Use neptune_logger.experiment.ABC to call methods that you would use, when working with neptune client, for example:

Check more methods here: experiment methods.

Log loss during train, validation and test

In the pl.LightningModule loss logging for train, validation and test.

class LitModel(pl.LightningModule):
   (...)

   def training_step(self, batch, batch_idx):
       (...)
       loss = ...
       self.log('train_loss', loss, prog_bar=False)

   def validation_step(self, batch, batch_idx):
       (...)
       loss = ...
       self.log('val_loss', loss, prog_bar=False)

   def test_step(self, batch, batch_idx):
       (...)
       loss = ...
       self.log('test_loss', loss, prog_bar=False)

Loss values will be tracked in Neptune automatically.

Tip

Trainer parameter: log_every_n_steps controls how frequent the logging is. Keep this parameter relatively high, say >100 for longer experiments.

PyTorch Lightning train and validation loss

Log accuracy score after train, validation and test epoch

In the pl.LightningModule implement accuracy score and log it.

class LitModel(pl.LightningModule):
   (...)

   def training_epoch_end(self, outputs):
       for output in outputs:
           (...)
       acc = accuracy_score(y_true, y_pred)
       self.log('train_acc', acc)

   def validation_epoch_end(self, outputs):
       for output in outputs:
           (...)
       acc = accuracy_score(y_true, y_pred)
       self.log('val_acc', acc)

   def test_epoch_end(self, outputs):
       for output in outputs:
           (...)
       acc = accuracy_score(y_true, y_pred)
       self.log('test_acc', acc)

Accuracy score will be calculated and logged after every train, validation and test epoch.

PyTorch Lightning train and validation acc

Tip

You can find full implementation of all metrics logging in this GitHub or in

           

       Run in Google Colab                

       View source on GitHub                

       See example in Neptune    

.

Log learning rate changes

Implement learning rate monitor as Callback

from pytorch_lightning.callbacks import LearningRateMonitor

# Add scheduler to the optimizer
class LitModel(pl.LightningModule):
   (...)

   def configure_optimizers(self):
       optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
       scheduler = LambdaLR(optimizer, lambda epoch: self.decay_factor ** epoch)
       return [optimizer], [scheduler]

# Instantiate LearningRateMonitor Callback
lr_logger = LearningRateMonitor(logging_interval='epoch')

# Pass lr_logger to the pl.Trainer as callback
trainer = pl.Trainer(logger=neptune_logger,
                    callbacks=[lr_logger])

Learning rate scheduler is defined in the configure_optimizers. It will change lr values after each epoch. These values will be tracked to Neptune automatically.

PyTorch Lightning lr-Adam chart

Log misclassified images for the test set

In the pl.LightningModule implement logic for identifying and logging misclassified images.

class LitModel(pl.LightningModule):
   (...)

   def test_step(self, batch, batch_idx):
       x, y = batch
       (...)
       y_true = ...
       y_pred = ...
       for j in np.where(np.not_equal(y_true, y_pred))[0]:
           img = np.squeeze(x[j].cpu().detach().numpy())
           img[img < 0] = 0
           img = (img / img.max()) * 256
           neptune_logger.experiment.log_image(
               'test_misclassified_images',
               img,
               description='y_pred={}, y_true={}'.format(y_pred[j], y_true[j]))

PyTorch Lightning misclassified images

Log gradient norm

Set pl.Trainer to log gradient norm.

trainer = pl.Trainer(logger=neptune_logger,
                    track_grad_norm=2)

Neptune will visualize gradient norm automatically.

Tip

When you use track_grad_norm it’s recommended to also set log_every_n_steps to something >100, so that you will avoid logging large amount of data.

PyTorch Lightning misclassified images

Log model checkpoints

Use ModelCheckpoint to make checkpoint during training, then log saved checkpoints to Neptune.

from pytorch_lightning.callbacks import ModelCheckpoint

# Instantiate ModelCheckpoint
model_checkpoint = ModelCheckpoint(filepath='my_model/checkpoints/{epoch:02d}-{val_loss:.2f}',
                                  save_weights_only=True,
                                  save_top_k=3,
                                  monitor='val_loss',
                                  period=1)

# Pass it to the pl.Trainer
trainer = pl.Trainer(logger=neptune_logger,
                    checkpoint_callback=model_checkpoint)

# Log model checkpoint to Neptune
for k in model_checkpoint.best_k_models.keys():
   model_name = 'checkpoints/' + k.split('/')[-1]
   neptune_logger.experiment.log_artifact(k, model_name)

# Log score of the best model checkpoint.
neptune_logger.experiment.set_property('best_model_score', model_checkpoint.best_model_score.tolist())

PyTorch Lightning model checkpoint

Tip

You can find full example implementation in this GitHub or in

           

       Run in Google Colab                

       View source on GitHub                

       See example in Neptune    

.

Log confusion matrix

Log confusion metrics after test time.

import matplotlib.pyplot as plt
from scikitplot.metrics import plot_confusion_matrix

model.freeze()
test_data = dm.test_dataloader()
y_true = np.array([])
y_pred = np.array([])

for i, (x, y) in enumerate(test_data):
   y = y.cpu().detach().numpy()
   y_hat = model.forward(x).argmax(axis=1).cpu().detach().numpy()

   y_true = np.append(y_true, y)
   y_pred = np.append(y_pred, y_hat)

fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.experiment.log_image('confusion_matrix', fig)

PyTorch Lightning confusion metrics

Log auxiliary info

Log model summary and number of GPUs used in the experiment.

# Log model summary
for chunk in [x for x in str(model).split('\n')]:
   neptune_logger.experiment.log_text('model_summary', str(chunk))

# Log number of GPU units used
neptune_logger.experiment.set_property('num_gpus', trainer.num_gpus)

PyTorch Lightning confusion metrics

Stop Neptune logger (Notebooks only)

Close Neptune logger and experiment once everything is logged.

neptune_logger.experiment.stop()

NeptuneLogger was created with close_after_fit=False, so we need to close Neptune experiment explicitly at the end. Again, this is only for Notebooks, as in scripts logger is closed automatically at the end of the script execution.

Explore Results

You just learned how to log PyTorch Lightning experiments to Neptune, by using Neptune logger which is part of the lightning library.

Above training is logged to Neptune in near real-time. Click on the link that was outputted to the console or charts to explore an experiment similar to yours.

In particular check:

Check this experiment (charts) or view above code snippets as a plain Python script on GitHub.

           

       Run in Google Colab                

       View source on GitHub                

       See example in Neptune    

How to ask for help?

Please visit the Getting help page. Everything regarding support is there.

Other integrations you may like

Here are other integrations with libraries from the PyTorch ecosystem:

You may also like these two integrations: