How-To Guides
Warning
In active development. Currently pre-alpha -- API may change significantly in future releases.
Quick start: Download annotations and convert to dataframe/ csv
Access project
import mdai
# Get variables from project info tab and user settings
DOMAIN = 'public.md.ai'
YOUR_PERSONAL_TOKEN = 'a1s2d3f4g4h5h59797kllh8vk'
PROJECT_ID = 'MwBe19Br'
# Instantiate the MD.ai client
mdai_client = mdai.Client(domain=DOMAIN, access_token=YOUR_PERSONAL_TOKEN)
Example output
Download annotations and images
# Download all annotations and all annotations across all datasets and all label groups
p = mdai_client.project(project_id=PROJECT_ID, path='.')
# Download all annotations across all label groups and images only for a specific dataset
p = mdai_client.project(project_id=PROJECT_ID, dataset_id=DATASET_ID, path='.')
# Download annotations from a specific label group for all images across all datasets
p = mdai_client.project(project_id=PROJECT_ID, label_group_id=LABEL_GROUP_ID, path='.')
# Download annotations from a specific label group and images only for a specific dataset
p = mdai_client.project(project_id=PROJECT_ID, dataset_id=DATASET_ID, label_group_id=LABEL_GROUP_ID, path='.')
Download annotations only
# Download the annotation data only for all datasets (all label groups)
p = mdai_client.project(project_id=PROJECT_ID, path='.', annotations_only=True)
# Download the annotation data only fo a specific dataset (all label groups)
p = mdai_client.project(project_id=PROJECT_ID, dataset_id=DATASET_ID, path='.', annotations_only=True)
# Download the annotation data only for a specific label group (by label group hash ID) for all images
p = mdai_client.project(PROJECT_ID, path='.', label_group_id=LABEL_GROUP_ID, annotations_only=True)
Download model outputs
# Download only the model outputs data for all models in the project
p = mdai_client.download_model_outputs(PROJECT_ID, DATASET_ID, path='.')
# Download only the model outputs data for a specific model in the project
p = mdai_client.download_model_outputs(PROJECT_ID, DATASET_ID, MODEL_ID, path='.')
Download DICOM metadata
You can download the DICOM metadata in either json
or csv
format.
# Download only the DICOM metadata
p = mdai_client.download_dicom_metadata(PROJECT_ID, DATASET_ID, format ="json", path='.')
Annotations dataframe and csv
There is a method available to convert the downloaded annotations json file to a csv that can be read using pandas. Copy the downloaded file name from the output above or from your downloaded json file. If you get a json error, try downloading outside of your firewall.
# Replace with your filename
JSON = 'mdai_public_project_MwBe19Br_annotations_labelgroup_all_2020-09-23-214038.json'
results = mdai.common_utils.json_to_dataframe(JSON)
# Annotations dataframe
annots_df = results['annotations']
# csv
annots_df.to_csv('project_csv.csv', index=False)
Notebooks
Download and/or upload annotations
The linked notebook gives examples for using the mdai library to
- download annotations from your project
- turn the exported file into a Pandas dataframe
- and import data back into your project
- You'll need to supply the specifics for your project, but this should get you started: Get and/or upload annotations notebook
Create assignments and get user info using CLI
Get and create assignments notebook
Get user progress
User the Manager class to subclass the function 'get_done_exams' to specify conditions for exam completion
Images
Display images
mdai.visualize.display_images(image_ids)
# additional arguments
mdai.visualize.display_images(image_ids, titles=None, cols=3, cmap="gray", norm=None, interpolation=None)
Get DICOM pixel array
to_RGB
returns a 3D array, rescale
returns uint8 scaled to 255
Convert Mask annotation output to a binary mask
Function to load a single Mask instance from one row of annotation data. This will turn the data output for a Mask annotation into a binary mask sized to the corresponding image.
results = mdai.common_utils.json_to_dataframe(JSON_FILENAME)
a = results['annotations']
all_masks = a[a.annotationMode == 'mask]
# grab one row from all_masks
mask = mdai.common_utils.convert_mask_annotation_to_array(row)
Get binary mask from shape annotations using library
mask = mdai.visualize.load_mask(image_id, dataset)
image_plus_mask = mdai.visualize.apply_mask(image, mask, color, alpha=0.3)
This will output a tuple of all the masks for the image and a list of the label numbers corresponding to each mask layer. The label numbers are those you created with labels_dict
.
To see the masks applied to an image, use
import pydicom
# dataset is created with mdai_client setup, see code at top of page
image_filenames = dataset.get_image_ids()
# Show first image
fn = image_filenames[0]
image = pydicom.dcmread(fn)
mask = mdai.visualize.load_mask(fn, dataset)
img = image.pixel_array
number_of_masks = len(mask[1])
fig = plt.figure()
for i in range(0,number_of_masks):
cols = 3
rows = np.ceil(number_of_masks/float(cols))
ax = fig.add_subplot(rows, cols, i + 1)
ax.axis('off')
plt.imshow((cv2.bitwise_and(img, img, mask = mask[0][:,:,i].astype(np.uint8))))
ax.set_title(mask[1][i])
plt.show()
Using the mask on the original image allows you to get segmented pixels for ROI and radiomic measurements.
Get binary mask without library
Function to load a single mask instance from one row of annotation data. This will turn one box, free form, polygon, etc into a binary mask sized to the corresponding image.
def load_mask_instance(row):
"""Load instance masks for the given annotation row. Masks can be different types,
mask is a binary true/false map of the same size as the image.
"""
mask = np.zeros((row.height, row.width), dtype=np.uint8)
annotation_mode = row.annotationMode
# print(annotation_mode)
if annotation_mode == "bbox":
# Bounding Box
x = int(row["data"]["x"])
y = int(row["data"]["y"])
w = int(row["data"]["width"])
h = int(row["data"]["height"])
mask_instance = mask[:,:].copy()
cv2.rectangle(mask_instance, (x, y), (x + w, y + h), 255, -1)
mask[:,:] = mask_instance
# FreeForm or Polygon
elif annotation_mode == "freeform" or annotation_mode == "polygon":
vertices = np.array(row["data"]["vertices"])
vertices = vertices.reshape((-1, 2))
mask_instance = mask[:,:].copy()
cv2.fillPoly(mask_instance, np.int32([vertices]), (255, 255, 255))
mask[:,:] = mask_instance
# Line
elif annotation_mode == "line":
vertices = np.array(row["data"]["vertices"])
vertices = vertices.reshape((-1, 2))
mask_instance = mask[:,:].copy()
cv2.polylines(mask_instance, np.int32([vertices]), False, (255, 255, 255), 12)
mask[:,:] = mask_instance
elif annotation_mode == "location":
# Bounding Box
x = int(row["data"]["x"])
y = int(row["data"]["y"])
mask_instance = mask[:,:].copy()
cv2.circle(mask_instance, (x, y), 7, (255, 255, 255), -1)
mask[:,:] = mask_instance
elif annotation_mode == "mask":
mask_instance = mask[:, :].copy()
if a.data["foreground"]:
for i in a.data["foreground"]:
mask_instance = cv2.fillPoly(mask_instance, [np.array(i, dtype=np.int32)], (255, 255, 255))
if a.data["background"]:
for i in a.data["background"]:
mask_instance = cv2.fillPoly(mask_instance, [np.array(i, dtype=np.int32)], (0,0,0))
mask[:, :] = mask_instance
elif annotation_mode is None:
print("Not a local instance")
return mask.astype(np.bool)
Get image with all annotations and masks
Display image and masks
mdai.visualize.display_annotations(
image,
boxes,
masks,
class_ids,
scores=None,
title="",
figsize=(16, 16),
ax=None,
show_mask=True,
show_bbox=True,
colors=None,
captions=None,
)
Getting UIDs from your original files
Use this code on your original data to create dictionaries of the UIDs from the image filenames
from pathlib import Path
import pydicom as py
images_path = Path('MY_PATH')
original_fn = list(images_path.glob('**/*.dcm'))
file_dict_sop = dict()
file_dict_series = dict()
file_dict_study = dict()
for f in original_fn:
d = py.dcmread(str(f))
file_dict_sop[f] = d.SOPInstanceUID
file_dict_series[f] = d.SeriesInstanceUID
file_dict_study[f] = d.StudyInstanceUID
Convert json file to dataframe
Obtain the json file either by the export tab in the Annotator tool or by using mdai_client.project(PROJECT_ID, annotations_only=True)
JSON is the path and name of the resulting json file. You can choose a dataset or it will default to all datasets
Simple
results = mdai.common_utils.json_to_dataframe(json_path)
anno_df = results['annotations']
studies_df = results['studies']
labels_df = results['labels']
or
Optional choose datasets
Custom annotations dataset
- Create a project and dataset using the quickstart section
- Get annotation data
- Edit annotations and feed back into dataset data
- Initialize custom dataset with edited data
annotations = dataset.all_annotations
# edit annotations...
annotations_edited = annotations
# load back in dataset
dataset.dataset_data['annotations'] = annotations_edited
dataset_custom = Dataset(dataset_data, images_dir)
# now use this new dataset for creating training/testing datsets with train_test_split
Split data into training and validation datasets
train_test_split(dataset, shuffle=True, validation_split=0.1)
DataGenerator
DataGenerator(dataset, batch_size=32, dim=(32, 32), n_channels=1, n_classes=10, shuffle=True, to_RGB=True, rescale=False)
Write to TFRecords
QA Python Examples
Get all user ids
result = mdai.common_utils.json_to_dataframe(JSON_FILE)
a = result['annotations']
users = a.createdById.unique()
Get all labels
Check for label conflicts in data
# set of labels for XOR, replace this with the label names
pick_one = {'Stage 1', 'Stage 2', 'Stage 3', 'Stage 4'}
def check_XOR_for_group(df_group):
study_assess = pd.Series(dtype='object')
number = str(df_group.number.unique()[0])
dataset = df_group.dataset.unique()[0]
value = len(set(df_group.labelName).intersection(pick_one))
if value == 0:
study_assess['missing'] = f'Dataset: {dataset}, Exam: {number}'
if value > 1:
study_assess['conflict'] = f'Dataset: {dataset}, Exam: {number}'
return study_assess
study_assess = a.groupby(['StudyInstanceUID']).apply(check_XOR_for_group).reset_index()
study_assess.columns = ['StudyInstanceUID', 'Problem', 'Exam Number']
Results
StudyInstanceUID Problem Exam Number
1.3.6.1.4.1.9328.50.6.103213 conflict Dataset: Dataset, Exam: 1645
1.3.6.1.4.1.9328.50.6.112227 missing Dataset: Dataset, Exam: 1663
Training models using library
import mdai
# Get variables from project info tab and user settings
DOMAIN = 'public.md.ai'
YOUR_PERSONAL_TOKEN = 'a1s2d3f4g4h5h59797kllh8vk'
PROJECT_ID = 'LxR6zdR2' # project info
DATASET_ID = 'D_ao3XWQ' # project info
PATH_FOR_DATA = '.'
PATH_TO_IMAGES = './mydata' # location of images if not downloaded from project
mdai_client = mdai.Client(domain=DOMAIN, access_token=YOUR_PERSONAL_TOKEN)
# download images and annotation data
p = mdai_client.project(PROJECT_ID, path=PATH_FOR_DATA)
# or, give path to images and download only the annotation data
p = mdai_client.project(PROJECT_ID, path=PATH_TO_IMAGES, annotations_only=True)
p = mdai.preprocess.Project(annotations_fp=JSONPATH_FROM_FUNCTION_ABOVE, images_dir=PATH_TO_IMAGES)
# show labels to get desired label ids for project
p.show_label_groups()
# create class labels_dict from desired labels and give class value
labels_dict = {
'L_ylR0L8': 0, # background
'L_DlqEAl': 1, # lung opacity
}
# initiate project with labels_dict
p.set_labels_dict(labels_dict)
# prepare dataset to instantiate annotations and image ids
dataset = p.get_dataset_by_id(DATASET_ID)
dataset.prepare()
Display label classes
Example output:
Label id: L_ylR0L8, Class id: 0, Class text: No Lung Opacity
Label id: L_DlqEAl, Class id: 1, Class text: Lung Opacity
Working with Large Languge Models (LLMs)
GPT Chat Completion
Create Chat Completion
Create a new chat completion request with MD.ai client
messages = [{"role": "system", "content": "You are a bot helping with medical AI training"},
{"role": "user", "content": "Tell me about MD.ai in 20 words"}]
mdai_client.chat_completion.create(messages, model='gpt-4', temperature=0)
Example Output:
{'id': 'chatcmpl-7R09i1Qe5Hhi6fZXGnTbdOS80Ee18',
'object': 'chat.completion',
'created': 1686669810,
'model': 'gpt-4-0314',
'usage': {'prompt_tokens': 29, 'completion_tokens': 24, 'total_tokens': 53},
'choices': [{'message': {'role': 'assistant',
'content': 'MD.ai is a collaborative platform for medical AI development, enabling data annotation, model training, and deployment for healthcare applications.'},
'finish_reason': 'stop',
'index': 0}]}
Function Calling
Create a completion call by describing functions to GPT and have the model intelligently choose to output a JSON object containing arguments to call those functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code.
messages = [{"role": "system", "content": "You are a bot helping with weather reporting"},
{"role": "user", "content": "What's the weather like in Boston?"}]
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
]
}
},
"required": [
"location"
]
}
}
]
mdai_client.chat_completion.create(messages, functions=functions, function_call="auto", model='gpt-4-0613', temperature=0)
Example Output:
{'id': 'chatcmpl-7W5ZEVNVR5cLKgx1gjpmxLzULT1C2',
'object': 'chat.completion',
'created': 1687882252,
'model': 'gpt-4-0613',
'choices': [{'index': 0,
'message': {'role': 'assistant',
'content': None,
'function_call': {'name': 'get_current_weather',
'arguments': '{\n "location": "Boston"\n}'}},
'finish_reason': 'function_call'}],
'usage': {'prompt_tokens': 91, 'completion_tokens': 16, 'total_tokens': 107}}
Note
Function calling currently only supported in 0613 models, in specific gpt-4-0613
and gpt-3.5-turbo-0613
.