Classifier radtorch.classifier
ImageClassifier
Class for image classifier. This class acts as wrapper to train a selected model (either pytorch neural network or a sklearn classifier) using a dataset which can be either a radtorch ImageDataset
or VolumeDataset
.
Optionally, a specific train and validation pytorch dataloaders may be manually specified instead of using radtorch dataset objects.
Training a Pytroch Neural Network
If the model to train is a pytorch neural network, in addition to the model, ImageClassifier
expects a pytorch criterion/loss function, a pytorch optimizer and an optional pytorch scheduler.
Training an sklearn classifier
If the model to be trained is an sklearn classifier, ImageClassifier
performs feature extraction followed by training the model. Accordingly, ImageClassifier
expects a model architecture for the feature extraction process.
Creating multiple classifier objects using same model/neural network object
To ensure results consistency, a new instance of pytorch model/neural network object MUST be instatiated with every classifier object.
For example, Do this:
M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf.fit(epochs=3)
M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf2 = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf2.fit(epochs=3)
and Do NOT do this :
M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf.fit(epochs=3)
clf2 = radtorch.classifier.ImageClassifier(model=M, dataset=ds)
clf2.fit(epochs=3)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Name to be give to the Image Classifier. If none provided, the current date and time will be used to created a generic classifier name. |
None |
model |
pytroch neural network or sklearn classifier |
Model to be trained. |
None |
dataset |
ImageDataset or VolumeDataset |
|
None |
dataloader_train |
pytorch dataloader |
Optional Training pytorch DataLoader |
None |
dataloader_valid |
pytorch dataloader |
Optional Validation pytorch DataLoader |
None |
device |
str |
Device to be used for training. |
'auto' |
feature_extractor_arch |
str |
Architecture of the model to be used for feature extraction when training sklearn classifier. See (https://pytorch.org/vision/0.8/models.html#classification)[https://pytorch.org/vision/0.8/models.html#classification] |
'vgg16' |
criterion |
pytorch loss function |
Loss function to be used during training a pytorch neural network. |
None |
optimizer |
pytorch optimizer |
Loss function to be used during training a pytorch neural network. |
None |
scheduler |
pytorch scheduler |
Scheduler to be used during training a pytorch neural network. |
None |
scheduler |
metric (str |
when using ReduceLROnPlateau pytorch scheduler, a target loss or accuracy must be provided to monitor. Options: 'train_loss', 'train_accuracy', 'valid_loss', 'valid_accuracy'. |
None |
use_checkpoint |
bool |
Path (str) to a saved checkpoint to continue training. If a checkpoint is used to resume training, training will be resumed from saved checkpoint to new/specified epoch number. |
False |
random_seed |
int |
Random seed (default=100) |
0 |
Using manual pytorch dataloaders
If maually created dataloaders are used, set dataset
to None.
Selecting device for training
Auto
mode automatically detects if there is GPU utilizes it for training.
Attributes:
Name | Type | Description |
---|---|---|
type |
str |
Type of the classifier model to be trained. |
train_losses |
list |
List of train losses recorded. Length = Number of epochs. |
valid_losses |
list |
List of validation losses recorded. Length = Number of epochs. |
train_acc |
list |
List of train accuracies recorded. Length = Number of epochs. |
valid_acc |
list |
List of validation accuracies recorded. Length = Number of epochs. |
valid_loss_min |
float |
Minimum Validation Loss to save checkpoint. |
best_model |
pytroch neural network or sklearn classifier |
Best trained model with lowest |
train_logs |
pandas dataframe |
Table/Dataframe with all train/validation losses. |
Methods
fit(self, **kwargs)
Trains the ImageClassifier
object.
Training a Model
All the following arguments, except auto_save_ckpt
and random_seed
, apply only when training a pytorch neural network model. Training sklearn classifier does not need arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
epochs |
int |
Number of training epochs (default: 20). |
required |
valid |
bool |
True to perform validation after each train step. False to only train on training dataset without validation. (default: True) |
required |
print_every |
int |
Number of epochs after which print results. (default: 1) |
required |
target_valid_loss |
float / string |
Minimum value to automatically save trained model afterwards. If 'lowest' is used, with every epoch , if the validation loss is less than minimum, then new best model is saved in checkpoint. Accepts maunally specified float minimum loss. (default: 'lowest') |
required |
auto_save_ckpt |
bool |
Automatically save chekpoints. If True, a checkpoint file is saved. Please see below. (default: False) |
required |
random_seed |
int |
Random seed. (default: 100) |
required |
verbose |
int |
Verbose level during training. Options: 0, 1, 2. (default: 2) |
required |
Using auto_save_ckpt
If auto_save_ckpt
is True, whenever training target is achieved, a new checkpoint will be saved.
The checkpoint file name = ImageClassifier.name+'epoch'+str(current_epoch)+'.checkpoint'
e.g. If the checkpoint is saved at epoch 10 for an ImageClassifier
named clf, the checkpoint file will be named: clf_epoch_10.chekpoint
Resuming training using a saved checkpoint file
When using a saved checkpoint to resume training, a new instance of the Model
/Pytorch Model and ImageClassifier
should be instantiated.
For example:
# Intial Training
M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf = radtorch.classifier.ImageClassifier(M, dataset)
clf.fit(auto_save_ckpt=True, epochs=5, verbose=3) # Saves the best checkpoint automatically
# Resume Training
M =radtorch.model.Model(model_arch='vgg16', in_channels=1, out_classes=2)
clf2 = radtorch.classifier.ImageClassifier(M, dataset, use_checkpoint='saved_ckpt.checkpoint')
clf2.fit(auto_save_ckpt=False, epochs=5, verbose=3)
Checkpoint Files
A checkpoint file is a dictionary of:
-
timestamp
: Timestamp when saving the checkpoint. -
type
:ImageClassifier
type. -
classifier
:ImageClassifier
object. -
epochs
: Total epochs specified on initial training. -
current_epoch
: Current epoch when checkpoint was saved. -
optimizer_state_dict
: Current state of Optimizer. -
train_losses
: List of train losses recorded -
valid_losses
: List of validation losses recorded -
valid_loss_min
: Min Validation loss - See above.
info(self)
Displays all information about the ImageClassifier
object.