Annotation Formats
Explore how to work with semantic segmentation annotation formats including CVAT for images 1.1 and Segmentation mask 1.1. Learn to convert these annotations into PyTorch tensors with correct data types for training segmentation models effectively. Master the process of mapping category labels and colors to indices and preparing target tensors.
We'll cover the following...
We learned to save the semantic segmentation annotation data in two formats: CVAT for images 1.1 and Segmentation mask 1.1.
Training a semantic segmentation dataset with PyTorch requires having target tensors with the type torch.int64. Each pixel in the target tensor must hold a long integer indicating the category index of the corresponding pixel in the original image. So, we need to run some code to convert the data exported by CVAT into suitable target tensors.
CVAT for images 1.1
The CVAT for images 1.1 format exports the semantic segmentation data into a single XML file. At the highest level, there is a <version> element and a <meta> element, where data about the annotation task is stored.
The semantic segmentation data per set was stored in the <image> elements, one for each image in the dataset.
If we expand one of the <image> elements, this is what we see: