Skip to content

MD.ai Interface Code

Consider the situation where you have trained a model of your own or have an existing model that you want to run and view predictions on the MD.ai platform. This document summarizes the main files that are necessary for glueing together your model and the MD.ai interface so that you can successfully run an inference task without any hassles.

Along with a folder that contains your pretrained model file, you need to create another folder named .mdai/ that will contain the necessary files for interacting with the interface. There are three main files that are needed in order to achieve this goal

  • config.yaml
  • mdai_deploy.py
  • requirements.txt

In addition to these files, you can also add helper python files that store methods for aiding the inference task.

Explicit additions of folders to sys.path

We explicitly add all the folders (that you will upload to the md.ai interface) to sys.path. So you can add additional helper code files to any folder that you want and this will allow you to import modules freely from any folder (even from within .mdai/).

Let's go into the details of what goes inside the necessary files now.

config.yaml

The config.yaml file is the main configuration file for the containerized model image. As of now, it should contain the following tags:

  • base_image: This tells docker what to use as the base python environment image. Currently we only support py37 which sets up a python 3.7 conda environment. (required)
  • device_type: This tells whether to run the model on a cpu or gpu. We are working on adding support for deep learning framework specific device types. (default: cpu)
  • cuda_version: This tells which CUDA version to use in the python environment, only if you have specified gpu as the device type. Currently supported versions of CUDA include 10.0, 10.1 and 11.0. (default: 11.0)

For example, a simple config file looks like the following -

base_image: py37
device_type: gpu
cuda_version: 10.1

We use Google cloud platform cpu/gpu container images (as parent docker images) to create derivative containers depending on what you select as the device_type. These container images provide us with a Python 3 environment and include Conda along with the NVIDIA stack for GPU images (CUDA 11.0, cuDNN 7.x, NCCL 2.x). We build a conda environment on top of this and install all the necessary frameworks used by your model, reading from the requirements.txt explained later on this page.

mdai_deploy.py

The mdai_deploy.py contains the main python code that will be used for running the model on the dataset and generating predictions. The main requirement is to create a class named MDAIModel that will have two methods, namely

  • __init__: Defines the path to the model checkpoint file and initializes the model by loading this file.
  • predict: Contains the code for reading the dicom files, preprocessing them and passing them through the model to generate predictions. This method returns a list of dictionaries called outputs that will be read by the MD.ai interface.

Let's now go through the schema of the input data that is read from the interface and the schema of the outputs dict that you will produce, to be read and displayed on our interface.

The input data is msgpack-serialized binary data that has the following schema:

{
    "files": [
        {
            "content": "bytes",
            "content_type": "str", # MIME type, e.g. 'application/dicom'
        },
        ...
    ],
    "annotations": [
        {
            "study_uid": "str",
            "series_uid": "str",
            "instance_uid": "str",
            "frame_number": "int",
            "class_index": "int",
            "data": "any",
        },
        ...
    ],
    "args": {
        "arg1": "str",
        "arg2": "str",
        ...
    }
}

Model scope specifies whether an entire study, series, or instance is given to the model.

  • If the model scope is 'INSTANCE', then files will be a single instance (list length of 1).
  • If the model scope is 'SERIES', then files will be a list of all instances in a series.
  • If the model scope is 'STUDY', then files will be a list of all instances in a study.

If multi-frame instances are supported, the model scope must be 'SERIES' or 'STUDY', because internally we treat these as DICOM series.

The additional args dictionary supplies values that may be used in a given run.

For a file with content_type='application/dicom', content is the raw binary data representing a DICOM file, and can be loaded using: ds = pydicom.dcmread(BytesIO(file["content"])).

The results returned by the predict function should have the following schema:

[
    {
        "type": "str", # 'NONE', 'ANNOTATION', 'IMAGE', 'DICOM', 'TEXT'
        "study_uid": "str",
        "series_uid": "str",
        "instance_uid": "str",
        "frame_number": "int",
        "class_index": "int",
        "data": {},
        "probability": "float",
        "explanations": [
            {
                "name": "str",
                "description": "str",
                "content": "bytes",
                "content_type": "str",
            },
            ...
        ],
    },
    ...
]

type defines the type of output that is produced, whether it is an annotation, or image, or text (needs to be entered explicitly).

study_uid defines the unique ID of the study to which the particular instance belongs (present in the instance DICOM tags).

series_uid defines the unique ID of the series to which the particular instance belongs (present in the instance DICOM tags).

instance_uid defines the unique ID of the particular instance (present in the instance DICOM tags).

class_index defines the output class for the particular instance and should map to the labels created on the MD.ai interface.

data defines a dictionary of resulting annotations such as bounding box coordinates.

probability defines the probability of the output belonging to the specified class_index if your model produces a probability value.

explanations define additional exploratory studies such as GradCAM, SmoothGrad analysis or any other instance related results that you want to display.

The DICOM UIDs must be supplied based on the scope of the label attached to class_index.

An example file might look like the following -

# Import statements for the python packages used

# Import methods from helper files if any

# Create an MDAIModel class with __init__ and predict methods
class MDAIModel:
    def __init__(self):
        modelpath = # Path to your model checkpoint file
        # examples
        # modelpath = os.path.join(os.path.dirname(os.path.dirname(__file__)), "model_file.pth")
        # modelpath = os.path.join(os.path.dirname(os.path.dirname(__file__)), "model_file.h5")

        self.model = #Load the model file

    def predict(self, data):
        # Load the input files
        input_files = data["files"]

        # Load the input data arguments (if any)
        input_args = data["args"]

        # Load human annotations as input (if any)
        input_annotations = data["annotations"]

        # List for storing results for each instance
        outputs = []

        # Loop through the data points
        for file in input_files:

            # Check if the file type is dicom or any other format.
            if file['content_type'] != 'application/dicom':
                continue

            # Read the dicom file (if using pydicom)
            ds = pydicom.dcmread(BytesIO(file["content"]))

            # Convert dicom to a numpy array of pixels
            image = ds.pixel_array

            # Code for preprocessing the image

            # Code for passing the image through the model
            # and generating predictions

            # Store results in a dict following the schema mentioned, for example -
            output = {
                "type": "ANNOTATION",
                "study_uid": str(ds.StudyInstanceUID),
                "series_uid": str(ds.SeriesInstanceUID),
                "instance_uid": str(ds.SOPInstanceUID),
                "class_index": int(class_index),
                "probability": float(probability),
                "explanations": [
                    {
                    # Add explanations if any
                    },
                ],
            }

            # Add results to the list
            outputs.append(output)

        # Return list to be read by the MD.ai interface
        return outputs

requirements.txt

In order for your model to run successfully on our interface, it is important that the correct versions of the python packages used by your model for generating predictions be installed inside the containerized model image. The packages are mostly that you import in the mdai_deploy.py file along with any helper files that you add. This can be done by creating a simple text file by the name requirements.txt that contains the package name along with the specific version that needs to be installed.

An example requirements.txt file looks like this

numpy==1.19.2
opencv-python==4.4.0.42
pillow==7.2.0
pydicom==2.0.0
tensorflow==2.1.0

or to add a git library from source like detectron2 use

git+https://github.com/facebookresearch/detectron2.git

In order to find the versions of the required packages, run PACKAGE.__version__ in your Jupyter notebook.

We provide the following packages, with their corresponding versions, pre-installed. You can save build time by not mentioning these in your requirements.txt file in the versions match your dependency requirements:

fastapi==0.61.1
msgpack==1.0.0
hypercorn==0.10.2
numpy==1.19.2
pydicom==2.0.0
pylibjpeg==1.1.1
pylibjpeg-libjpeg==1.1.0
pylibjpeg-openjpeg==1.0.1

Tensorflow/Pytorch requirements

Currently, you need to add the versions for tensorflow or pytorch specifically in the requirements file, depending on the framework that you are using. Once we add support for framework tied device types then this will no longer be required.

Once these files are ready, store them in a folder named .mdai and follow the next steps as mentioned on the page Deploying models.