Skip to content

Add Datasets

Every project should contain at least one dataset. A common use case for having more than one dataset in a project is separating training and test sets. After creating a new project, click on the New Dataset button to create a new dataset. This dataset will serve as the container into which your data will be loaded. Note the project and dataset IDs.

For each dataset you can also choose to either keep the Dataset access Project-Wide or Restricted.

Supported Medical Modalities

  • CR: Computed Radiography
  • CT: Computed Tomography
  • DX: Digital Radiography
  • IVOCT: Intravascular Optical Coherence Tomography
  • MG: Mammography
  • MR: Magnetic Resonance Imaging (MRI)
  • NM: Nuclear Medicine
  • OCT: Optical coherence tomography (non-Ophthalmic)
  • OPT: Ophthalmic Tomography
  • OT: Other
  • PT: Positron emission tomography
  • RF: Radio Fluoroscopy
  • RG: Radiographic imaging (conventional film/screen)
  • US: Ultrasound
  • XA: X-Ray Angiography

Load external data to a dataset

Choose a dataset type eg. DICOM. There are multiple ways to add external data to your project's dataset.

Upload

If your dataset source is set to Upload, there are two ways to load external data:

  1. Use the web UI directly (drag-and-drop a folder containing your DICOM images, or use the upload files/upload folder buttons). The files are detected recursively from within the folders.

    03_dataset_upload.png

  2. Use the MD.ai CLI tool. This is highly recommended for larger datasets (>100 GB). See the CLI Usage page for command descriptions.

We also support uploading zip files using this method.

Once uploading is complete, the uploaded data will be processed in the background and the DICOM series thumbnails will appear on completion.

04_dataset_processed.png

DICOM Push (C-STORE)

You can choose to stream images to the project via the C-STORE/DICOM Push protocol. The Hostname, Port and Remote AE values will be provided for each dataset.

By default the dataset will be Unlocked and allow all incoming DICOM pushes. You can choose to Lock the dataset to stop incoming connections.

dicom-push.png

Google Cloud Storage

You can easily attach a Google Cloud Storage bucket to your project -

  1. Add the bucket name.
  2. Add a folder prefix (optional)
  3. Add permissions in your GCS Bucket as outlined
  4. Confirm Bucket permissions once added
  5. Press Connect

06_gcp.png

Google Healthcare API

You can also connect to the Google Cloud Healthcare API DICOM Store to add data to your project.

  1. Add the GCP Project ID, GCP Region, Dataset ID and DICOM Store ID
  2. Optionally add a GCP Annotation Store by activating and adding the Annotation Store ID. If the annotation store already exists, we will attempt to import annotation records within the store. Otherwise, the annotation store will be created.

google-healthcare-api.png

Amazon S3

You can easily attach an Amazon S3 bucket to your project -

  1. Add the bucket name.
  2. Add a folder prefix (optional)
  3. Add permissions in your Amazon S3 bucket as outlined
  4. Confirm Bucket permissions once added
  5. Press Connect

amazon-s3

Troubleshooting Processing is Stuck

If the processing of files gets stuck at a number divisible by 1000, it's likely due to an issue with your internet stability. You can cancel the current processing task and reload your images.

Cancel Processing

Go to the Project Card and click the three horizontal dots on the dataset card for which you want to cancel and restart processing. Choose Cancel Processing and then try again with either the CLI tool or the UI.

cancelprocessing.png

Turn off sleep mode

Turn off your computer's sleep mode temporarily for loading large datasets. When the computer goes to sleep, it will disconnect the processing. If that happens, choose Cancel Processing on the Edit page of the project and reload the data.