Skip to content

MD.ai Interface Code

Consider the situation where you have a trained model of your own or have an existing model that you want to run on a dataset and visualize the predictions in a simple and user-friendly manner. MD.ai provides an efficient way to achieve this by simply uploading a zip file consisting of the model code arranged in a format compatible with our interface. This document summarizes the main files that are necessary for glueing your model with the MD.ai interface so that you can successfully run an inference task without any hassle.

Inside the folder that contains your model weights file, you first need to create a folder named .mdai/. This folder will contain all the necessary files required for interacting with the interface. There are three main files that are needed in order to achieve this goal:

  • config.yaml
  • mdai_deploy.py
  • requirements.txt

In addition to these files, you can also add helper python files that store methods for aiding the inference task, for example files that store preprocessing/postprocessing functions.

Explicit additions of folders to sys.path

We explicitly add all the folders within the zip file (that you upload to the md.ai interface) to sys.path. So you can add additional helper code files to any folder that you want and this will allow you to import modules freely from any folder (even from within .mdai/).

To make things easier, you can download the zip file containing skeleton code for the MD.ai interface here and fill in the blanks as you read through the documentation here. Note that the folder .mdai inside the folder modelis a hidden folder and so you might not see it when you extract the zip file unless your system has the Show hidden files option on.

Let's go into the details of what goes inside the necessary files now.

config.yaml

The config.yaml file is the main configuration file that defines the runtime requirements for your model. As of now, it should contain the following tags:

  • base_image: This tells docker what to use as the base python environment image. Currently, we only support py37 which sets up a python 3.7 conda environment. (required)
  • device_type: This tells whether to run the model on a cpu or gpu. We are working on adding support for deep learning framework-specific device types. (default: cpu)
  • cuda_version: This tells which CUDA version to use in the python environment. This is required only if you have specified gpu as the device type. Currently supported versions of CUDA include 10.0, 10.1 and 11.0. (default: 11.0)

For example, a basic config file looks like the following -

base_image: py37
device_type: gpu
cuda_version: 10.1

We use Google cloud platform's cpu/gpu docker images (as parent images) to create derivative containers depending on what you select as the device_type. These container images provide us with a Python 3 environment and include Conda along with the NVIDIA stack for GPU images (CUDA 11.0, cuDNN 7.x, NCCL 2.x). We build a conda environment on top of this and install all the necessary frameworks used by your model, reading from the requirements.txt which is explained later on this page.

mdai_deploy.py

The mdai_deploy.py contains the main python code that will be called for running the model on the given dataset and generating predictions. The main requirement for this file is to create a class named MDAIModel that will have two methods, namely:

  • __init__: Defines the necessary values that will successfully initialize the model, for example path to the model checkpoint file and the model definition itself.
  • predict: Contains the code for reading the input files, preprocessing them and passing them through the model to generate predictions. This method should return a list of dictionaries called outputs that will be read by the MD.ai interface.

Let's dig deeper into how to transform your code so that it fits the description of the MDAIModel class. But first, we need to understand the schema of the input data that is read from the interface and the schema of the outputs dict that will be returned.

Once you create a project and upload data on our interface, the data gets stored as msgpack-serialized binary data and has the following schema:

{
    # image file
    "files": [
        {
            "content": "bytes",
            "content_type": "str", # MIME type, e.g. 'application/dicom'
        },
        ...
    ],

    # annotations on md.ai for the image, if any
    "annotations": [
        {
            "study_uid": "str",
            "series_uid": "str",
            "instance_uid": "str",
            "frame_number": "int",
            "class_index": "int",
            "data": "any",
        },
        ...
    ],
    "args": {
        "arg1": "str",
        "arg2": "str",
        ...
    }
}

For a file with files["content_type"] = 'application/dicom', files["content"] is the raw binary data representing a DICOM file, and can be loaded using: ds = pydicom.dcmread(BytesIO(file["content"])) as we'll show in an example later. If you have annotated the image on MD.ai, then the annotations for this specific input image can also be accessed under files["annotations"].

Another thing to note is that once you create a new model version on your project (as explained in the Deploying models page) you have the option to specify the model scope for your model i.e does the model runs and produces outputs for each instance individually, or for a series or a study/exam as a whole. Model scope thus specifies whether an entire study, series, or instance is returned to the model class from the storage bucket.

  • If the model scope is 'INSTANCE', then files will be a single instance (list length of 1).
  • If the model scope is 'SERIES', then files will be a list of all instances in a series.
  • If the model scope is 'STUDY', then files will be a list of all instances in a study.

If multi-frame instances are supported, the model scope must be 'SERIES' or 'STUDY', because internally we treat these as DICOM series.

The additional args dictionary supplies values that may be used in a given run.

Now once the images are loaded this way, the next step is to add the necessary code for running this input through the model. It can simply be achieved by passing this input through already existing functions in helper files or you can explicitly add code inside the predict method to generate predictions. Once the model returns an output, the predict method needs to return results in a particular schema (required for our interface to read and display outputs correctly) as shown below:

[
    {
        "type": "str", # choose from {'NONE', 'ANNOTATION', 'IMAGE', 'DICOM', 'TEXT'}
        "study_uid": "str",
        "series_uid": "str",
        "instance_uid": "str",
        "frame_number": "int",
        "class_index": "int",
        "data": {},
        "probability": "float",
        "explanations": [
            {
                "name": "str",
                "description": "str",
                "content": "bytes",
                "content_type": "str",
            },
            ...
        ],
    },
    ...
]

type defines the type of output that is produced, whether it is an annotation, or image, or text (needs to be entered explicitly).

study_uid defines the unique ID of the study to which the particular instance belongs (present in the instance DICOM tags).

series_uid defines the unique ID of the series to which the particular instance belongs (present in the instance DICOM tags).

instance_uid defines the unique ID of the particular instance (present in the instance DICOM tags).

class_index defines the output class according to your model definition for the particular instance and should map to the labels created on the MD.ai interface.

data defines a dictionary of resulting annotations such as bounding box coordinates. (optional)

probability defines the probability of the output belonging to the specified class_index if your model produces a probability value. (optional)

explanations define additional exploratory studies such as GradCAM, SmoothGrad analysis or any other instance related results that you want to display. (optional)

The DICOM UIDs must be supplied based on the scope of the label attached to class_index.

An example file might look like the following -

# Import statements for the python packages used

# Import methods from helper files if any

# Create an MDAIModel class with __init__ and predict methods
class MDAIModel:
    def __init__(self):
        modelpath = # Path to your model checkpoint file
        # examples
        # modelpath = os.path.join(os.path.dirname(os.path.dirname(__file__)), "model_file.pth")
        # modelpath = os.path.join(os.path.dirname(os.path.dirname(__file__)), "model_file.h5")

        self.model = #Load the model file

    def predict(self, data):
        # Load the input files
        input_files = data["files"]

        # Load the input data arguments (if any)
        input_args = data["args"]

        # Load human annotations as input (if any)
        input_annotations = data["annotations"]

        # List for storing results for each instance
        outputs = []

        # Loop through the data points
        for file in input_files:

            # Check if the file type is dicom or any other format.
            if file['content_type'] != 'application/dicom':
                continue

            # Read the dicom file (if using pydicom)
            ds = pydicom.dcmread(BytesIO(file["content"]))

            # Convert dicom to a numpy array of pixels
            image = ds.pixel_array

            # Code for preprocessing the image

            # Code for passing the image through the model
            # and generating predictions

            # Store results in a dict following the schema mentioned, for example -
            result = {
                "type": "ANNOTATION",
                "study_uid": str(ds.StudyInstanceUID),
                "series_uid": str(ds.SeriesInstanceUID),
                "instance_uid": str(ds.SOPInstanceUID),
                "class_index": int(class_index),
                "data": {},
                "probability": float(probability),
                "explanations": [
                    {
                    # Add explanations if any
                    },
                ],
            }

            # Add results to the list
            outputs.append(result)

        # Return list to be read by the MD.ai interface
        return outputs

This example file can also be used as skeleton code which you can edit according to your needs and requirements. For more examples, check out the code for our X-ray classification model and Lung segmentation model which we have already deployed on our platform for reference.

requirements.txt

In order for your model to run successfully on our interface, it is important that the correct versions of the python packages used for running your model be installed inside our environment. These packages are mostly those you import in the mdai_deploy.py file along with those in any helper files that you use. These can be easily installed by simply creating a text file by the name requirements.txt that contains the package name along with the specific version that needs to be installed.

An example requirements.txt file looks like this:

numpy==1.19.2
opencv-python==4.4.0.42
pillow==7.2.0
pydicom==2.0.0
tensorflow==2.1.0

or to add a git library from source like detectron2 use:

git+https://github.com/facebookresearch/detectron2.git

or to add libraries that use the --find-links or -f option with pip install, add the link to archives such as sdist (.tar.gz) or wheel (.whl) files in one line, followed by the package name in the next line. For example, pre-built detectron2 that uses torch 1.6 and CUDA 10.1 can be installed by adding the following two lines to the requirements.txt file:

-f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html
detectron2

In order to find the versions of the required packages, run PACKAGE.__version__ in your Jupyter notebook.

We provide the following packages, with their corresponding versions, pre-installed. You can save build time by not mentioning these in your requirements.txt file if the versions match your dependency requirements:

fastapi==0.65.1
msgpack==1.0.2
hypercorn==0.11.2
numpy==1.20.3
pydicom==2.1.2
pylibjpeg==1.3.0
pylibjpeg-libjpeg==1.2.0
pylibjpeg-openjpeg==1.1.1
pylibjpeg-rle==1.1.0

Tensorflow/Pytorch requirements

Currently, you need to add the versions for tensorflow or pytorch specifically in the requirements file, depending on the framework that you are using. Once we add support for framework tied device types then this will no longer be required.

Once these files are ready, store them in a folder named .mdai and follow the next steps as mentioned on the page Deploying models.