Classifier radtorch.classifier

`ImageClassifier`

Class for image classifier. This class acts as wrapper to train a selected model (either pytorch neural network or a sklearn classifier) using a dataset which can be either a radtorch ImageDataset or VolumeDataset.

Optionally, a specific train and validation pytorch dataloaders may be manually specified instead of using radtorch dataset objects.

Training a Pytroch Neural Network

If the model to train is a pytorch neural network, in addition to the model, ImageClassifier expects a pytorch criterion/loss function, a pytorch optimizer and an optional pytorch scheduler.

Training an sklearn classifier

If the model to be trained is an sklearn classifier, ImageClassifier performs feature extraction followed by training the model. Accordingly, ImageClassifier expects a model architecture for the feature extraction process.

Creating multiple classifier objects using same model/neural network object

To ensure results consistency, a new instance of pytorch model/neural network object MUST be instatiated with every classifier object.

For example, Do this:

M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf.fit(epochs=3)

M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf2 = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf2.fit(epochs=3)

and Do NOT do this :

M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)

clf = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf.fit(epochs=3)

clf2 = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf2.fit(epochs=3)

Parameters:

Name	Type	Description	Default
`name`	`str`	Name to be give to the Image Classifier. If none provided, the current date and time will be used to created a generic classifier name.	`None`
`model`	`pytroch neural network or sklearn classifier`	Model to be trained.	`None`
`dataset`	`ImageDataset or VolumeDataset`	`ImageDataset` or `VolumeDataset` to be used for training.	`None`
`dataloader_train`	`pytorch dataloader`	Optional Training pytorch DataLoader	`None`
`dataloader_valid`	`pytorch dataloader`	Optional Validation pytorch DataLoader	`None`
`device`	`str`	Device to be used for training.	`'auto'`
`feature_extractor_arch`	`str`	Architecture of the model to be used for feature extraction when training sklearn classifier. See (https://pytorch.org/vision/0.8/models.html#classification)[https://pytorch.org/vision/0.8/models.html#classification]	`'vgg16'`
`criterion`	`pytorch loss function`	Loss function to be used during training a pytorch neural network.	`None`
`optimizer`	`pytorch optimizer`	Loss function to be used during training a pytorch neural network.	`None`
`scheduler`	`pytorch scheduler`	Scheduler to be used during training a pytorch neural network.	`None`
`scheduler`	`metric (str`	when using ReduceLROnPlateau pytorch scheduler, a target loss or accuracy must be provided to monitor. Options: 'train_loss', 'train_accuracy', 'valid_loss', 'valid_accuracy'.	`None`
`use_checkpoint`	`bool`	Path (str) to a saved checkpoint to continue training. If a checkpoint is used to resume training, training will be resumed from saved checkpoint to new/specified epoch number.	`False`
`random_seed`	`int`	Random seed (default=100)	`0`

Using manual pytorch dataloaders

If maually created dataloaders are used, set dataset to None.

Selecting device for training

Auto mode automatically detects if there is GPU utilizes it for training.

Attributes:

Name	Type	Description
`type`	`str`	Type of the classifier model to be trained.
`train_losses`	`list`	List of train losses recorded. Length = Number of epochs.
`valid_losses`	`list`	List of validation losses recorded. Length = Number of epochs.
`train_acc`	`list`	List of train accuracies recorded. Length = Number of epochs.
`valid_acc`	`list`	List of validation accuracies recorded. Length = Number of epochs.
`valid_loss_min`	`float`	Minimum Validation Loss to save checkpoint.
`best_model`	`pytroch neural network or sklearn classifier`	Best trained model with lowest `Validation Loss` in case of pytorch neural networks or the trained classifier for sklearn classifiers.
`train_logs`	`pandas dataframe`	Table/Dataframe with all train/validation losses.

Methods

`fit(self, **kwargs)`

Trains the ImageClassifier object.

Training a Model

All the following arguments, except auto_save_ckpt and random_seed, apply only when training a pytorch neural network model. Training sklearn classifier does not need arguments.

Parameters:

Name	Type	Description	Default
`epochs`	`int`	Number of training epochs (default: 20).	required
`valid`	`bool`	True to perform validation after each train step. False to only train on training dataset without validation. (default: True)	required
`print_every`	`int`	Number of epochs after which print results. (default: 1)	required
`target_valid_loss`	`float / string`	Minimum value to automatically save trained model afterwards. If 'lowest' is used, with every epoch , if the validation loss is less than minimum, then new best model is saved in checkpoint. Accepts maunally specified float minimum loss. (default: 'lowest')	required
`auto_save_ckpt`	`bool`	Automatically save chekpoints. If True, a checkpoint file is saved. Please see below. (default: False)	required
`random_seed`	`int`	Random seed. (default: 100)	required
`verbose`	`int`	Verbose level during training. Options: 0, 1, 2. (default: 2)	required

Using auto_save_ckpt

If auto_save_ckpt is True, whenever training target is achieved, a new checkpoint will be saved.

The checkpoint file name = ImageClassifier.name+'epoch'+str(current_epoch)+'.checkpoint'

e.g. If the checkpoint is saved at epoch 10 for an ImageClassifier named clf, the checkpoint file will be named: clf_epoch_10.chekpoint

Resuming training using a saved checkpoint file

When using a saved checkpoint to resume training, a new instance of the Model/Pytorch Model and ImageClassifier should be instantiated.

For example:

# Intial Training

M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf = radtorch.classifier.ImageClassifier(M, dataset)
clf.fit(auto_save_ckpt=True, epochs=5, verbose=3) # Saves the best checkpoint automatically

# Resume Training

M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf2 = radtorch.classifier.ImageClassifier(M, dataset, use_checkpoint='saved_ckpt.checkpoint')
clf2.fit(auto_save_ckpt=False, epochs=5, verbose=3)

Checkpoint Files

A checkpoint file is a dictionary of:

timestamp: Timestamp when saving the checkpoint.
type: ImageClassifier type.
classifier: ImageClassifier object.
epochs: Total epochs specified on initial training.
current_epoch: Current epoch when checkpoint was saved.
optimizer_state_dict: Current state of Optimizer.
train_losses: List of train losses recorded
valid_losses: List of validation losses recorded
valid_loss_min: Min Validation loss - See above.

`info(self)`

Displays all information about the ImageClassifier object.