How-To Guides

Warning

In active development. Currently pre-alpha -- API may change significantly in future releases.

Quick start: Download annotations and convert to dataframe/ csv

Access project

import mdai
# Get variables from project info tab and user settings
DOMAIN = 'public.md.ai'
YOUR_PERSONAL_TOKEN = 'a1s2d3f4g4h5h59797kllh8vk'
PROJECT_ID = 'MwBe19Br'

# Instantiate the MD.ai client
mdai_client = mdai.Client(domain=DOMAIN, access_token=YOUR_PERSONAL_TOKEN)

Example output

Successfully authenticated to public.md.ai.

Download annotations and images

# Download all annotations and all annotations across all datasets and all label groups
p = mdai_client.project(project_id=PROJECT_ID, path='.')

# Download all annotations across all label groups and images only for a specific dataset
p = mdai_client.project(project_id=PROJECT_ID, dataset_id=DATASET_ID, path='.')

# Download annotations from a specific label group for all images across all datasets
p = mdai_client.project(project_id=PROJECT_ID, label_group_id=LABEL_GROUP_ID, path='.')

# Download annotations from a specific label group and images only for a specific dataset
p = mdai_client.project(project_id=PROJECT_ID, dataset_id=DATASET_ID, label_group_id=LABEL_GROUP_ID, path='.')

Download annotations only

# Download the annotation data only for all datasets (all label groups)
p = mdai_client.project(project_id=PROJECT_ID, path='.',  annotations_only=True)

# Download the annotation data only fo a specific dataset (all label groups)
p = mdai_client.project(project_id=PROJECT_ID, dataset_id=DATASET_ID, path='.',  annotations_only=True)

# Download the annotation data only for a specific label group (by label group hash ID) for all images
p = mdai_client.project(PROJECT_ID, path='.', label_group_id=LABEL_GROUP_ID, annotations_only=True)

Download model outputs

# Download only the model outputs data for all models in the project
p = mdai_client.download_model_outputs(PROJECT_ID, DATASET_ID, path='.')

# Download only the model outputs data for a specific model in the project
p = mdai_client.download_model_outputs(PROJECT_ID, DATASET_ID, MODEL_ID, path='.')

Download DICOM metadata

You can download the DICOM metadata in either json or csv format.

# Download only the DICOM metadata
p = mdai_client.download_dicom_metadata(PROJECT_ID, DATASET_ID, format ="json", path='.')

Annotations dataframe and csv

There is a method available to convert the downloaded annotations json file to a csv that can be read using pandas. Copy the downloaded file name from the output above or from your downloaded json file. If you get a json error, try downloading outside of your firewall.

# Replace with your filename
JSON = 'mdai_public_project_MwBe19Br_annotations_labelgroup_all_2020-09-23-214038.json'
results = mdai.common_utils.json_to_dataframe(JSON)
# Annotations dataframe
annots_df = results['annotations']
# csv
annots_df.to_csv('project_csv.csv', index=False)

Notebooks

Download and/or upload annotations

The linked notebook gives examples for using the mdai library to

download annotations from your project
turn the exported file into a Pandas dataframe
and import data back into your project
You'll need to supply the specifics for your project, but this should get you started: Get and/or upload annotations notebook

Create assignments and get user info using CLI

Get and create assignments notebook

Get user progress

User the Manager class to subclass the function 'get_done_exams' to specify conditions for exam completion

Get user progress notebook

Images

Display images

mdai.visualize.display_images(image_ids)

# additional arguments
mdai.visualize.display_images(image_ids, titles=None, cols=3, cmap="gray", norm=None, interpolation=None)

Get DICOM pixel array

pixel_array = mdai.visualize.load_dicom_image(image_id, to_RGB=False, rescale=True)

to_RGB returns a 3D array, rescale returns uint8 scaled to 255

Convert Mask annotation output to a binary mask

Function to load a single Mask instance from one row of annotation data. This will turn the data output for a Mask annotation into a binary mask sized to the corresponding image.

results = mdai.common_utils.json_to_dataframe(JSON_FILENAME)
a = results['annotations']
all_masks = a[a.annotationMode == 'mask]
# grab one row from all_masks
mask = mdai.common_utils.convert_mask_annotation_to_array(row)

Get binary mask from shape annotations using library

mask = mdai.visualize.load_mask(image_id, dataset)
image_plus_mask = mdai.visualize.apply_mask(image, mask, color, alpha=0.3)

This will output a tuple of all the masks for the image and a list of the label numbers corresponding to each mask layer. The label numbers are those you created with labels_dict.

To see the masks applied to an image, use

import pydicom

# dataset is created with mdai_client setup, see code at top of page
image_filenames = dataset.get_image_ids()
# Show first image
fn = image_filenames[0]
image = pydicom.dcmread(fn)
mask = mdai.visualize.load_mask(fn, dataset)
img = image.pixel_array

number_of_masks = len(mask[1])
fig = plt.figure()
for i in range(0,number_of_masks):
    cols = 3
    rows = np.ceil(number_of_masks/float(cols))
    ax = fig.add_subplot(rows, cols, i + 1)
    ax.axis('off')
    plt.imshow((cv2.bitwise_and(img, img, mask = mask[0][:,:,i].astype(np.uint8))))
    ax.set_title(mask[1][i])

plt.show()

Using the mask on the original image allows you to get segmented pixels for ROI and radiomic measurements.

Get binary mask without library

Function to load a single mask instance from one row of annotation data. This will turn one box, free form, polygon, etc into a binary mask sized to the corresponding image.

def load_mask_instance(row):
    """Load instance masks for the given annotation row. Masks can be different types,
    mask is a binary true/false map of the same size as the image.
    """

    mask = np.zeros((row.height, row.width), dtype=np.uint8)

    annotation_mode = row.annotationMode
    # print(annotation_mode)

    if annotation_mode == "bbox":
        # Bounding Box
        x = int(row["data"]["x"])
        y = int(row["data"]["y"])
        w = int(row["data"]["width"])
        h = int(row["data"]["height"])
        mask_instance = mask[:,:].copy()
        cv2.rectangle(mask_instance, (x, y), (x + w, y + h), 255, -1)
        mask[:,:] = mask_instance

    # FreeForm or Polygon
    elif annotation_mode == "freeform" or annotation_mode == "polygon":
        vertices = np.array(row["data"]["vertices"])
        vertices = vertices.reshape((-1, 2))
        mask_instance = mask[:,:].copy()
        cv2.fillPoly(mask_instance, np.int32([vertices]), (255, 255, 255))
        mask[:,:] = mask_instance

    # Line
    elif annotation_mode == "line":
        vertices = np.array(row["data"]["vertices"])
        vertices = vertices.reshape((-1, 2))
        mask_instance = mask[:,:].copy()
        cv2.polylines(mask_instance, np.int32([vertices]), False, (255, 255, 255), 12)
        mask[:,:] = mask_instance

    elif annotation_mode == "location":
        # Bounding Box
        x = int(row["data"]["x"])
        y = int(row["data"]["y"])
        mask_instance = mask[:,:].copy()
        cv2.circle(mask_instance, (x, y), 7, (255, 255, 255), -1)
        mask[:,:] = mask_instance

    elif annotation_mode == "mask":
        mask_instance = mask[:, :].copy()
        if a.data["foreground"]:
            for i in a.data["foreground"]:
                mask_instance = cv2.fillPoly(mask_instance, [np.array(i, dtype=np.int32)], (255, 255, 255))
        if a.data["background"]:
            for i in a.data["background"]:
                mask_instance = cv2.fillPoly(mask_instance, [np.array(i, dtype=np.int32)], (0,0,0))
        mask[:, :] = mask_instance

    elif annotation_mode is None:
        print("Not a local instance")


    return mask.astype(np.bool)

Get image with all annotations and masks

image, class_ids, bboxes, masks = mdai.visualize.get_image_ground_truth(image_id, dataset)

Display image and masks

mdai.visualize.display_annotations(
    image,
    boxes,
    masks,
    class_ids,
    scores=None,
    title="",
    figsize=(16, 16),
    ax=None,
    show_mask=True,
    show_bbox=True,
    colors=None,
    captions=None,
)

Getting UIDs from your original files

Use this code on your original data to create dictionaries of the UIDs from the image filenames

from pathlib import Path
import pydicom as py

images_path = Path('MY_PATH')
original_fn = list(images_path.glob('**/*.dcm'))

file_dict_sop = dict()
file_dict_series = dict()
file_dict_study = dict()

for f in original_fn:
    d = py.dcmread(str(f))
    file_dict_sop[f] = d.SOPInstanceUID
    file_dict_series[f] = d.SeriesInstanceUID
    file_dict_study[f] = d.StudyInstanceUID

Convert json file to dataframe

Obtain the json file either by the export tab in the Annotator tool or by using mdai_client.project(PROJECT_ID, annotations_only=True)

JSON is the path and name of the resulting json file. You can choose a dataset or it will default to all datasets

Simple

results = mdai.common_utils.json_to_dataframe(json_path)
anno_df = results['annotations']
studies_df = results['studies']
labels_df = results['labels']

or

Optional choose datasets

results = mdai.common_utils.json_to_dataframe(json_path, datasets=[IDS_OF_DATASETS])

Custom annotations dataset

Create a project and dataset using the quickstart section
Get annotation data
Edit annotations and feed back into dataset data
Initialize custom dataset with edited data

annotations = dataset.all_annotations

# edit annotations...
annotations_edited = annotations

# load back in dataset
dataset.dataset_data['annotations'] = annotations_edited
dataset_custom = Dataset(dataset_data, images_dir)

# now use this new dataset for creating training/testing datsets with train_test_split

Split data into training and validation datasets

train_test_split(dataset, shuffle=True, validation_split=0.1)

train_dataset, valid_dataset = mdai.common_utils.train_test_split(dataset)

DataGenerator

DataGenerator(dataset, batch_size=32, dim=(32, 32), n_channels=1, n_classes=10, shuffle=True, to_RGB=True, rescale=False)

mdai.utils.keras_utils.DataGenerator(dataset)

Write to TFRecords

mdai.utils.tensorflow_utils.write_to_tfrecords(output_path, dataset)

QA Python Examples

Get all user ids

result = mdai.common_utils.json_to_dataframe(JSON_FILE)
a = result['annotations']
users = a.createdById.unique()

Get all labels

result = mdai.common_utils.json_to_dataframe(JSON_FILE)
labels = result['labels']

Check for label conflicts in data

# set of labels for XOR, replace this with the label names
pick_one = {'Stage 1', 'Stage 2', 'Stage 3', 'Stage 4'}

def check_XOR_for_group(df_group):
    study_assess = pd.Series(dtype='object')
    number = str(df_group.number.unique()[0])
    dataset = df_group.dataset.unique()[0]

    value = len(set(df_group.labelName).intersection(pick_one))
    if value == 0:
        study_assess['missing'] = f'Dataset: {dataset}, Exam: {number}'
    if value > 1:
        study_assess['conflict'] = f'Dataset: {dataset}, Exam: {number}'
    return study_assess

study_assess = a.groupby(['StudyInstanceUID']).apply(check_XOR_for_group).reset_index()
study_assess.columns = ['StudyInstanceUID', 'Problem', 'Exam Number']

Results

StudyInstanceUID              Problem   Exam Number
1.3.6.1.4.1.9328.50.6.103213  conflict  Dataset: Dataset, Exam: 1645
1.3.6.1.4.1.9328.50.6.112227  missing   Dataset: Dataset, Exam: 1663

Training models using library

import mdai

# Get variables from project info tab and user settings
DOMAIN = 'public.md.ai'
YOUR_PERSONAL_TOKEN = 'a1s2d3f4g4h5h59797kllh8vk'
PROJECT_ID = 'LxR6zdR2' # project info
DATASET_ID = 'D_ao3XWQ' # project info
PATH_FOR_DATA = '.'
PATH_TO_IMAGES = './mydata' # location of images if not downloaded from project

mdai_client = mdai.Client(domain=DOMAIN, access_token=YOUR_PERSONAL_TOKEN)

# download images and annotation data
p = mdai_client.project(PROJECT_ID, path=PATH_FOR_DATA)
# or, give path to images and download only the annotation data
p = mdai_client.project(PROJECT_ID, path=PATH_TO_IMAGES,  annotations_only=True)
p = mdai.preprocess.Project(annotations_fp=JSONPATH_FROM_FUNCTION_ABOVE, images_dir=PATH_TO_IMAGES)

# show labels to get desired label ids for project
p.show_label_groups()
# create class labels_dict from desired labels and give class value
labels_dict = {
    'L_ylR0L8': 0, # background
    'L_DlqEAl': 1, # lung opacity
}
# initiate project with labels_dict
p.set_labels_dict(labels_dict)
# prepare dataset to instantiate annotations and image ids
dataset = p.get_dataset_by_id(DATASET_ID)
dataset.prepare()

Display label classes

dataset.show_classes()

Example output:

Label id: L_ylR0L8, Class id: 0, Class text: No Lung Opacity
Label id: L_DlqEAl, Class id: 1, Class text: Lung Opacity

Working with Large Languge Models (LLMs)

GPT Chat Completion

Create Chat Completion

Create a new chat completion request with MD.ai client

messages = [{"role": "system", "content": "You are a bot helping with medical AI training"},
            {"role": "user", "content": "Tell me about MD.ai in 20 words"}]
mdai_client.chat_completion.create(messages, model='gpt-4', temperature=0)

Example Output:

{'id': 'chatcmpl-7R09i1Qe5Hhi6fZXGnTbdOS80Ee18',
 'object': 'chat.completion',
 'created': 1686669810,
 'model': 'gpt-4-0314',
 'usage': {'prompt_tokens': 29, 'completion_tokens': 24, 'total_tokens': 53},
 'choices': [{'message': {'role': 'assistant',
    'content': 'MD.ai is a collaborative platform for medical AI development, enabling data annotation, model training, and deployment for healthcare applications.'},
   'finish_reason': 'stop',
   'index': 0}]}

Function Calling

Create a completion call by describing functions to GPT and have the model intelligently choose to output a JSON object containing arguments to call those functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code.

messages = [{"role": "system", "content": "You are a bot helping with weather reporting"},
            {"role": "user", "content": "What's the weather like in Boston?"}]
functions = [
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": [
                            "celsius",
                            "fahrenheit"
                        ]
                    }
                },
                "required": [
                    "location"
                ]
            }
        }
    ]
mdai_client.chat_completion.create(messages, functions=functions, function_call="auto", model='gpt-4-0613', temperature=0)

Example Output:

{'id': 'chatcmpl-7W5ZEVNVR5cLKgx1gjpmxLzULT1C2',
 'object': 'chat.completion',
 'created': 1687882252,
 'model': 'gpt-4-0613',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': None,
    'function_call': {'name': 'get_current_weather',
     'arguments': '{\n  "location": "Boston"\n}'}},
   'finish_reason': 'function_call'}],
 'usage': {'prompt_tokens': 91, 'completion_tokens': 16, 'total_tokens': 107}}

Note

Function calling currently only supported in 0613 models, in specific gpt-4-0613 and gpt-3.5-turbo-0613.