Skip to content

Data radtorch.data

One of the core functions of radtorch is the ability to handle different types of medical/non-medical DICOM/non-DICOM images efficiently with ease. Below is list of classes that make the magic happen.

ImageObject

Creates a 3D tensor whose dimensions = [channels, width, height] from an image path.

Parameters:

Name Type Description Default
path str

Path to an image.

required
out_channels int

Number of output channels. Only 1 and 3 channels supported.

required
transforms list

Albumentations transformations. See Image Augmentation.

required
WW int or list

Window width for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

required
WL int or list

Window level for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

required

Returns:

Type Description
tensor

3D tensor whose dimensions = [channels, width, height]

Examples:

>>> i = radtorch.data.ImageObject(path='data/PROTOTYPE/DIRECTORY/abdomen/abd_1/1-001.dcm')
>>> i.shape

torch.Size([1, 512, 512])

VolumeObject

Creates an Image Volume Object (4D tensor) from series images contained in a folder.

Parameters:

Name Type Description Default
directory str

Folder containing series/sequence images. Images must be DICOM files.

required
out_channels int

Number of output channels. Only 1 and 3 channels supported.

required
transforms list

Albumentations transformations. See https://albumentations.ai/docs/getting_started/image_augmentation/.

required
WW int or list

Window width for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

required
WL int or list

Window level for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

required

Returns:

Type Description
tensor

4D tensor with dimensions = [channels, number_images/depth, width, height]. See https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html

Examples:

>>> i = radtorch.data.VolumeObject(directory='data/PROTOTYPE/DIRECTORY/abdomen/abd_1')
>>> i.shape

torch.Size([1, 40, 512, 512])

ImageDataset (Dataset)

Creates pytorch dataset(s) and dataloader(s) objects from a parent folder. Use this class for image tasks that invovles handling each single image as a single instance of your dataset.

Examples:

import radtorch
import albumentations as A

# Specify image transformations
T = A.Compose([A.Resize(256,256)])

# Create dataset object
ds = radtorch.data.ImageDataset(
                                folder='data/4CLASS/',
                                split={'valid':0.2, 'test':0.2},
                                out_channels=1,
                                transforms={'train':T,'valid': T,'test': T},
                                 )

ds.data_stat()
ds.table

Parameters:

Name Type Description Default
folder str

Parent folder containing images. radtorch.ImageDataset expects images to be arranged in the following structure:

root/
    class_1/
            image_1
            image_2
            ...
    class_2/
            image_1
            image_2
            ...

required
name str

Name to be give to the dataset. If none provided, the current date and time will be used to created a generic dataset name. (default=None)

None
label_table str|dataframe

The table containing data labels for your images. Expected table should contain at least 2 columns: image path column and a label column. Table can be string path to CSV or a pandas dataframe object.(default=None)

None
instance_id bool

True if the data provided in the image path column in label_table contains the image id not the absolute path for the image. (default= False)

False
add_extension bool

If instance_id =True, use this to add extension to image path as needed. Extension must be provided without "." e.g. "dcm". (default=False)

False
out_channels int

Number of output channels. (default=1)

1
WW int or list

Window width for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

None
WL int or list

Window level for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

None
path_col str

Name of the column containing image path data in the label_table. (default='path')

'path'
label_col str

Name of the column containing label data in the label_table. (default='label')

'label'
extension str

Type/Extension of images. (default='dcm')

'dcm'
transforms dict

Dictionary of Albumentations transformations in the form of {'train': .. , 'valid': .. , 'test': .. }. See https://albumentations.ai/docs/getting_started/image_augmentation/ . (default=None)

None
random_state int

Random seed (default=100)

100
sample float

Sample or percent of the overall data to be used. (default=1.0)

1.0
split dict

dictionary defining how data will be split for training/validation/testing. Follows the sturcture {'valid': float, 'test': float} or {'valid':'float'} in case no testing subset is needed. The percent of the training subset is infered automatically.

False
ignore_zero_img bool

True to ignore images containig all zero pixels. (default=False)

False
normalize bool

True to normalize image data between 0 and 1. (default=True)

True
batch_size int

Dataloader batch size. (default = 16)

16
shuffle bool

True to shuffle images during training. (default=True)

True
weighted_sampler bool

True to use a weighted sampler for unbalanced datasets. See https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler. (default=False)

False
num_workers int

Dataloader CPU workers. (default = 0)

0

Attributes:

Name Type Description
classes list

List of generated classes/labels.

class_to_idx dict

Dictionary of generated classes/labels and corresponding class/label id.

idx_train list

List of index values of images/instances used for training subset. These refer to index of ImageDataset.table.

idx_valid list

List of index values of images/instances used for validation subset. These refer to index of ImageDataset.table.

idx_test list

List of index values of images/instances used for testing subset. These refer to index of ImageDataset.table.

table pandas dataframe

Table of images , paths and their labels.

table_train pandas dataframe

Table of images used for training. Subset of ImageDataset.table.

table_valid pandas dataframe

Table of images used for validation. Subset of ImageDataset.table.

table_test pandas dataframe

Table of images used for testing. Subset of ImageDataset.table.

tables dict

Dictionary of all generated tables in the form of: {'train': table, 'valid':table, 'test':table}.

dataset_train pytorch dataset object

Training pytorch Dataset

dataset_valid pytorch dataset object

Validation pytorch Dataset

dataset_test pytorch dataset object

Testing pytorch Dataset

datasets dict

Dictionary of all generated Datasets in the form of: {'train': Dataset, 'valid':Dataset, 'test':Dataset}.

dataloader_train pytorch dataloader object

Training pytorch DataLoader

dataloader_valid pytorch dataloader object

Validation pytorch DataLoader

dataloader_test pytorch dataloader object

Testing pytorch DataLoader

dataloaders dict

Dictionary of all generated Dataloaders in the form of: {'train': Dataloader, 'valid':Dataloader, 'test':Dataloader}.

class_weights tensor

Values of class weights, for imbalanced datasets, to be used to weight loss functions. See https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss.

sampler_weights tensor

Values of vlass weights, for imbalanced datasets, to be used to sample from the dataset using Pytroch WeightedRandomSampler. Affects only training dataset not validation or testing. See https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler

Methods

data_stat(self, plot=True, figsize=(8, 6), cmap='viridis')

Displays distribution of classes across subsets as table or plot.

Parameters:

Name Type Description Default
plot bool

True, display data as figure. False, display data as table.

True
figsize tuple

size of the displayed figure.

(8, 6)
cmap string

Name of Matplotlib color map to be used. See Matplotlib colormaps

'viridis'

Returns:

Type Description
pandas dataframe

if plot=False

Examples:

ds = radtorch.data.ImageDataset(folder='data/4CLASS/', out_channels=1)
ds.data_stat()

view_batch(self, subset='train', figsize=(15, 5), cmap='gray', num_images=False, rows=2)

Displays a batch from a certain subset.

Parameters:

Name Type Description Default
subset string

Datasubset to use: either 'train', 'valid', or 'test'.

'train'
figsize tuple

Size of the displayed figure.

(15, 5)
cmap string

Name of Matplotlib color map to be used. See Matplotlib colormaps

'gray'
num_images int

Number of displayed image. Usually equals batch_size unless otherwise specified.

False
rows int

Number of rows.

2

Returns:

Type Description
figure

figure containing samples

Examples:

ds = radtorch.data.ImageDataset(folder='data/4CLASS/', out_channels=1)
ds.view_batch()

view_image(self, id=0, figsize=(25, 5), cmap='gray')

Displays separate images/channels of an image.

Parameters:

Name Type Description Default
id int

Target image id in dataset.table (row index).

0
figsize tuple

size of the displayed figure.

(25, 5)
cmap string

Name of Matplotlib color map to be used. See Matplotlib colormaps

'gray'

Returns:

Type Description
figure

figure containing samples

Examples:

ds = radtorch.data.ImageDataset(folder='data/4CLASS/', out_channels=3)
ds.view_image(id=9)
ds = radtorch.data.ImageDataset(folder='data/4CLASS/', out_channels=3, WW=[1500, 350, 80], WL=[-600, 50, 40])
ds.view_image(id=9)

VolumeDataset (Dataset)

Dataset object for DICOM Volume. Creates dataset(s) and dataloader(s) ready for training using radtorch or pytorch directly.

Parameters:

Name Type Description Default
folder str

Parent folder containing images. radtorch.VolumeDataset expects files to be arranged in the following structure:

root/
    class_1/
            sequence_1/
                        image_1
                        image_2
                        ...
            sequence_2/
                        image_1
                        image_2
                        ...
    class_2/
            sequence_1/
                        image_1
                        image_2
                        ...
            sequence_2/
                        image_1
                        image_2
                        ...
    ...

required
name str

Name to be give to the dataset. If none provided, the current date and time will be used to created a generic dataset name. (default=None)

None
label_table str|dataframe

The table containing data labels for your images. Expected table should contain at least 2 columns: image path column and a label column. Table can be string path to CSV or a pandas dataframe object.(default=None)

None
use_file bool

True to use pre-generated/resampled volume files. To use Volume files:

  1. Volume files should be created using radtorch.data.VolumeObject

  2. Saved with extension .pt.

  3. Placed in the sequence folder.

False
extension str

Type/Extension of images.

'dcm'
out_channels int

Number of output channels.

1
WW int or list

Window width for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

None
WL int or list

Window level for DICOM images. Single value if using 1 channel or list of 3 values for 3 channels. See https://radiopaedia.org/articles/windowing-ct.

None
path_col str

Name of the column containing image path data in the label_table.

'path'
label_col str

Name of the column containing label data in the label_table.

'label'
study_col str

Name of the column containing study id in label_table.

'study_id'
transforms dict

Dictionary of Albumentations transformations in the form of {'train': .. , 'valid': .. , 'test': .. }. See https://albumentations.ai/docs/getting_started/image_augmentation/. NOTE: If using already resampled/created volume files, transformation should be applied during volume creation not dataset i.e. Transforms specified here have no effect during training.

None
random_state int

Random seed.

100
sample float

Sample or percent of the overall data to be used.

1.0
split dict

dictionary defining how data will be split for training/validation/testing. Follows the sturcture {'valid': float, 'test': float} or {'valid':'float'} in case no testing subset is needed. The percent of the training subset is infered automatically.

False
normalize bool

True to normalize image data between 0 and 1.

True
batch_size int

Dataloader batch size.

16
shuffle bool

True to shuffle images during training.

True
weighted_sampler bool

True to use a weighted sampler for unbalanced datasets. See https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler.

False
num_workers int

Dataloader CPU workers.

0

Attributes:

Name Type Description
classes list

List of generated classes/labels.

class_to_idx dict

Dictionary of generated classes/labels and corresponding class/label id.

idx_train list

List of index values of images/instances used for training subset. These refer to index of ImageDataset.table.

idx_valid list

List of index values of images/instances used for validation subset. These refer to index of ImageDataset.table.

idx_test list

List of index values of images/instances used for testing subset. These refer to index of ImageDataset.table.

table pandas dataframe

Table of images , paths and their labels.

table_train pandas dataframe

Table of images used for training. Subset of ImageDataset.table.

table_valid pandas dataframe

Table of images used for validation. Subset of ImageDataset.table.

table_test pandas dataframe

Table of images used for testing. Subset of ImageDataset.table.

tables dict

Dictionary of all generated tables in the form of: {'train': table, 'valid':table, 'test':table}.

dataset_train pytorch dataset object

Training pytorch Dataset

dataset_valid pytorch dataset object

Validation pytorch Dataset

dataset_test pytorch dataset object

Testing pytorch Dataset

datasets dict

Dictionary of all generated Datasets in the form of: {'train': Dataset, 'valid':Dataset, 'test':Dataset}.

dataloader_train pytorch dataloader object

Training pytorch DataLoader

dataloader_valid pytorch dataloader object

Validation pytorch DataLoader

dataloader_test pytorch dataloader object

Testing pytorch DataLoader

dataloaders dict

Dictionary of all generated Dataloaders in the form of: {'train': Dataloader, 'valid':Dataloader, 'test':Dataloader}.

class_weights tensor

Values of class weights, for imbalanced datasets, to be used to weight loss functions. See https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss.

sampler_weights tensor

Values of vlass weights, for imbalanced datasets, to be used to sample from the dataset using Pytroch WeightedRandomSampler. Affects only training dataset not validation or testing. See https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler

Examples:

import radtorch
import albumentations as A

# Specify image transformations
T = A.Compose([A.Resize(256,256)])

# Create dataset object
ds = radtorch.data.VolumeDataset(
                                folder='data/PROTOTYPE/DIRECTORY/',
                                split={'valid':0.3, 'test':0.3},
                                out_channels=1,
                                transforms={'train':T,'valid': T,'test': T},
                                 )

ds.data_stat()
ds.table

Methods

data_stat(self, plot=True, figsize=(8, 6), cmap='viridis')

Displays distribution of classes across subsets as table or plot.

Parameters:

Name Type Description Default
plot bool

True, display data as figure. False, display data as table.

True
figsize tuple

size of the displayed figure.

(8, 6)
cmap string

Name of Matplotlib color map to be used. See Matplotlib colormaps

'viridis'

Returns:

Type Description
pandas dataframe

if plot=False

view_study(self, id, plane='axial', figsize=(15, 15), cols=5, rows=5, start=0, end=-1)

Show sample images from a study.

Warning

This works only with single channel images. Multiple channels are not supported yet here.

Parameters:

Name Type Description Default
id int

Target study id in dataset.table (row index).

required
plane str

Anatomical plane to display the images in. Options: 'axial', 'coronal' or 'sagittal'.

'axial'
figsize tuple

Size of the displayed figure.

(15, 15)
cols int

Number of columns.

5
rows int

Number of rows.

5
start int

id of the first image to display.

0
end int

id of the last image to display.

-1

Returns:

Type Description
figure

figure containing images from study.

Examples:

import radtorch
import albumentations as A

# Specify image transformations
T = A.Compose([A.Resize(256,256)])

# Create dataset object
ds = radtorch.data.VolumeDataset(
                                folder='data/PROTOTYPE/DIRECTORY/',
                                split={'valid':0.3, 'test':0.3},
                                out_channels=1,
                                transforms={'train':T,'valid': T,'test': T},
                     )
ds.view_study(id=0, plane='axial')

ds.view_study(id=0, plane='coronal', start=150)
Back to top