Skip to content

Python Client Library

Warning

In active development. Currently pre-alpha -- API may change significantly in future releases.

Labels and annotations created on the MD.ai annotator can be exported for training deep learning models.

The MD.ai python client library is designed to perform authentication, automatically download images and annotations, prepare the datasets, and used to train and evaluate deep learning models using deep learning libraries such as Google's TensorFlow/Keras and fast.ai/PyTorch.

See Github Repo

Installation

Install and update using pip:

pip install --upgrade mdai

Getting Started

Import the mdai library

import mdai

# check version
mdai.__version__

Create an mdai client

The mdai client requires an access token, which authenticates you as the user. To create a new token or select an existing token, navigate to the "Personal Access Tokens" tab on your user settings page at the specified MD.ai domain (e.g., public.md.ai).

Important: keep your access tokens safe. Do not ever share your tokens.

mdai_client = mdai.Client(domain='public.md.ai', access_token="$YOUR-PERSONAL-TOKEN")

Example output:

    Successfully authenticated to public.md.ai.

Define a project

Define a project you have access to by passing in the project id. The project id can be found in the URL in the following format: https://public.md.ai/annotator/project/{project_id}. For example, project_id would be LxR6zdR2 for https://public.md.ai/annotator/project/LxR6zdR2. Specify optional path as the data directory (if left blank, will default to current working directory).

Example:

p = mdai_client.project('LxR6zdR2', path='./lesson3-data')

A project object will attempt to download the dataset (images and annotations separately) and extract images, annotations to the specified path. However, if the latest version of images or annotations have been downloaded and extracted, then cached data is used.

    Using path './lesson3-data' for data.
    Preparing annotations export for project LxR6zdR2...
    Preparing images export for project LxR6zdR2...
    Using cached images data for project LxR6zdR2.
    Using cached annotations data for project LxR6zdR2.

Create Project if annotations_only=True

If you are unable to export images from within the Annotator, this feature was turned off in order to save money for your institution/company. To use the MD.ai api further we need to do two things.

  • First, your images need to be in a specific format. Project requires the images to be in StudyInstanceUID/SeriesInstanceUID/SOPInstanceUID.dcm format.
  • Second, download the annotations and create the project:
PATH_TO_IMAGES = 'path to my data goes here'
PROJECT_ID = 'get the project id from the Annotator'
mdai_client.project(PROJECT_ID, path=PATH_TO_IMAGES,  annotations_only=True)

This downloads the json only. Grab the name of the json file and create the project with:

from mdai import preprocessing
p = preprocessing.Project(
                annotations_fp=json_path,
                images_dir=PATH_TO_IMAGES
            )

Important

The images in PATH_TO_IMAGES need to be in a specific format to work with the rest of the api. Convert your images to StudyInstanceUID/SeriesInstanceUID/SOPInstanceUID.dcm format first.

Prepare data

Show available label groups

p.show_label_groups()

Set label ids and their corresponding class id

Label ids and corresponding class ids, must be explicitly set by Project#set_label_dict method in order to prepare datasets.

Example:

# this maps label ids to class ids
labels_dict = {
    'L_ylR0L8': 0, # background
    'L_DlqEAl': 1, # lung opacity
}
p.set_labels_dict(labels_dict)

Show available datasets

p.show_datasets()

Get dataset by id, or by name

dataset = p.get_dataset_by_id('D_ao3XWQ')
dataset.prepare()

Shows each label's label id, their class id and class name

dataset.show_classes()

Example output:

    Label id: L_ylR0L8, Class id: 0, Class text: No Lung Opacity
    Label id: L_DlqEAl, Class id: 1, Class text: Lung Opacity

Split data into traing and validation datasets

train_dataset, valid_dataset = mdai.common_utils.train_test_split(dataset)

By default, dataset is shuffled, and the train/validation split ratio is 0.9 to 0.1.

Visualization

Visualization of different annotation modes

Display images


Import Annotations

Machine learning model outputs can be loaded into the project as annotations. Again, import the mdai library and create a client mdai_client similar to the "Getting Started" section above. Ensure the correct domain and provide your access token.

import mdai
mdai_client = mdai.Client(domain='public.md.ai', access_token="$YOUR-PERSONAL-TOKEN")

Resource IDs

We will be using the load_model_annotations(project_id, dataset_id, model_id, annotations) method on the client. To obtain the project and dataset IDs, click on the "Info" button in the bottom left sidebar. When more than one dataset is available within the project, select the dataset you will be loading your annotations into.

Imported annotations are attached to a machine learning model. To create a new model, click the "Models" button in the top left sidebar, and click the "Create Model" button. You will then find the model ID in the newly created model card.

Example IDs:

project_id = "LxR6zdR2"
dataset_id = 'D_vBnWb1'
model_id = 'M_qvlG84'

Annotations format

The annotations variable must be a list of dicts with the set of required fields dependent on the different label types. All annotations must contain its corresponding labelId. For exam-scoped labels, StudyInstanceUID is required. For series-scoped labels, SeriesInstanceUID is required. For image-scoped labels, SOPInstanceUID is required. For local image-level labels, an additional field data is required, with the format of data specified based on the example below:

annotations = [
  # global exam-scoped label
  {
    'labelId': 'L_jZRPN2',
    'StudyInstanceUID': '1.2.276.0.7230010.3.1.2.8323329.19529.1513659878.548505',
  },
  # global series-scoped label
  {
    'labelId': 'L_nZYWNg',
    'SeriesInstanceUID': '1.2.276.0.7230010.3.1.3.8323329.19529.1513659878.548504',
  },
  # global image-scoped label
  {
    'labelId': 'L_3EMmEY',
    'SOPInstanceUID': '1.2.276.0.7230010.3.1.4.8323329.19529.1513659878.548506',
  },
  # local image-scoped label where annotation mode is 'bbox' (Bounding Box)
  {
    'labelId': 'L_xZgpEM',
    'SOPInstanceUID': '1.2.276.0.7230010.3.1.4.8323329.25053.1513659982.870619',
    'data': {'x': 200, 'y': 200, 'width': 200, 'height': 400}
  },
  # local image-scoped label where annotation mode is 'polygon' (Polygon)
  {
    'labelId': 'L_nEK9AV',
    'SOPInstanceUID': '1.2.276.0.7230010.3.1.4.8323329.14861.1513659866.150292',
    'data': {'vertices': [[300,300],[500,500],[600,700],[200,800]]}
  },
  # local image-scoped label where annotation mode is 'freeform' (Freeform)
  {
    'labelId': 'L_9ZymZx',
    'SOPInstanceUID': '1.2.276.0.7230010.3.1.4.8323329.20913.1513659971.911318',
    'data': {'vertices': [[300,300],[500,500],[600,700],[200,800]]}
  },
  # local image-scoped label where annotation mode is 'location' (Location)
  {
    'labelId': 'L_BA3JN0',
    'SOPInstanceUID': '1.2.276.0.7230010.3.1.4.8323329.2010.1513660008.830854',
    'data': {'x': 400, 'y': 400}
  },
  # local image-scoped label where annotation mode is 'line' (Line)
  {
    'labelId': 'L_WN98Eg',
    'SOPInstanceUID': '1.2.276.0.7230010.3.1.4.8323329.29889.1513659996.912413',
    'data': {'vertices': [[300,300],[500,500],[600,700],[200,800]]}
  },
]

annotations will also be validated during actual processing and importing. If the size of annotations is very large, it is recommended to run this method on smaller chunks.

mdai_client.load_model_annotations(project_id, dataset_id, model_id, annotations)

Example output:

    Importing annotations into project LxR6zdR2, dataset D_vBnWb1, model M_qvlG84...
    Successfully imported annotations into project LxR6zdR2.

Tutorials with Jupyter Notebooks

To get started, we recommend looking at several Jupyter notebooks we have prepared. The following notebooks show how to perform classification of chest vs. abdomen x-rays using TensorFlow/Keras and TFRecords, and using fast.ai library, which is based on pytorch.

Chest/Abdomen X-Ray images classification using different deep learning libraries

Chest/Abdomen X-Ray Annotator Project URL: https://public.md.ai/annotator/project/PVq9raBJ/workspace

Introduction to deep learning for medical imaging lessons Addtionally, we created Jupyter notebooks covering the basics of using the client library for downloading and parsing annotation data, and training and evaluating different deep learning models for classification, semantic and instance segmentation and object detection problems in the medical imaging domain.

See lessons link for these lessons below.

  • Lesson 1. Classification of chest vs. adominal X-rays using TensorFlow/Keras Github Annotator
  • Lesson 2. Lung X-Rays Semantic Segmentation using UNets. Github Annotator
  • Lesson 3. RSNA Pneumonia detection using Kaggle data format Github Annotator
  • Lesson 3. RSNA Pneumonia detection using MD.ai python client library Github Annotator

Running Jupyter notebooks on Google Colab

It’s easy to run a Jupyter notebook on Google's Colab with free GPU use (time limited). For example, you can add the Github jupyter notebook path to https://colab.research.google.com/notebook: Select the "GITHUB" tab, and add the Lesson 1 URL:https://github.com/mdai/ml-lessons/blob/master/lesson1-xray-images-classification.ipynb

To use the GPU, in the notebook menu, go to Runtime -> Change runtime type -> switch to Python 3, and turn on GPU. See more colab tips.

Advanced: How to run on Google Cloud Platform with Deep Learning Images

You can also run the notebook with powerful GPUs on the Google Cloud Platform. In this case, you need to authenticate to the Google Cloud Platform, create a private virtual machine instance running a Google's Deep Learning image, and import the lessons. See instructions below.

GCP Deep Learnings Images How To


API Documentation

Technical API Documentation here