How to import COCO annotations into Kili
In this tutorial, we will demonstrate how to import COCO annotations into Kili.
Setup
Let's start by installing Kili:
%pip install kili numpy opencv-python
import json
from pprint import pprint
import numpy as np
from kili.client import Kili
Data collection
We will use the COCO dataset to illustrate how to import COCO annotations into Kili.
For this tutorial, we will use a subset of the val2017
dataset. The full dataset can be downloaded here.
!curl https://raw.githubusercontent.com/kili-technology/kili-python-sdk/main/recipes/datasets/coco2017/annotations/captions_val2017_filtered.json --output captions_val2017_filtered.json
!curl https://raw.githubusercontent.com/kili-technology/kili-python-sdk/main/recipes/datasets/coco2017/annotations/instances_val2017_filtered.json --output instances_val2017_filtered.json
!curl https://raw.githubusercontent.com/kili-technology/kili-python-sdk/main/recipes/datasets/coco2017/annotations/person_keypoints_val2017_filtered.json --output person_keypoints_val2017_filtered.json
COCO format
The format is described here.
The file instances_val2017_filtered.json
contains the following keys:
instances_val2017 = json.load(open("instances_val2017_filtered.json"))
print(instances_val2017.keys())
dict_keys(['annotations', 'categories', 'images', 'info', 'licenses'])
Each annotation contains a the image id to which it belongs, the category id, the segmentation and the bounding box:
pprint(instances_val2017["annotations"][0])
{'area': 88.52115000000006,
'bbox': [102.49, 118.47, 7.9, 17.31],
'category_id': 64,
'id': 22328,
'image_id': 37777,
'iscrowd': 0,
'segmentation': [[110.39,
135.78,
110.39,
127.62,
110.01,
119.6,
106.87,
118.47,
104.37,
120.1,
102.49,
122.73,
103.74,
125.49,
105.24,
128.88,
106.87,
132.39,
107.38,
135.78,
110.39,
135.65]]}
We can print the categories of COCO this way:
for category in instances_val2017["categories"]:
print(category["id"], category["name"])
1 person
2 bicycle
3 car
4 motorcycle
5 airplane
6 bus
7 train
8 truck
9 boat
10 traffic light
11 fire hydrant
13 stop sign
14 parking meter
15 bench
16 bird
17 cat
18 dog
19 horse
20 sheep
21 cow
22 elephant
23 bear
24 zebra
25 giraffe
27 backpack
28 umbrella
31 handbag
32 tie
33 suitcase
34 frisbee
35 skis
36 snowboard
37 sports ball
38 kite
39 baseball bat
40 baseball glove
41 skateboard
42 surfboard
43 tennis racket
44 bottle
46 wine glass
47 cup
48 fork
49 knife
50 spoon
51 bowl
52 banana
53 apple
54 sandwich
55 orange
56 broccoli
57 carrot
58 hot dog
59 pizza
60 donut
61 cake
62 chair
63 couch
64 potted plant
65 bed
67 dining table
70 toilet
72 tv
73 laptop
74 mouse
75 remote
76 keyboard
77 cell phone
78 microwave
79 oven
80 toaster
81 sink
82 refrigerator
84 book
85 clock
86 vase
87 scissors
88 teddy bear
89 hair drier
90 toothbrush
The file captions_val2017_filtered.json
contains transcription data:
captions_val2017 = json.load(open("captions_val2017_filtered.json"))
print(captions_val2017.keys())
dict_keys(['annotations', 'images', 'info', 'licenses'])
print(captions_val2017["annotations"][0])
{'caption': 'A small closed toilet in a cramped space.', 'id': 441, 'image_id': 331352}
In this dataset, each image has 5 captions given by different annotators.
The file person_keypoints_val2017_filtered.json
contains keypoints data:
person_keypoints_val2017 = json.load(open("person_keypoints_val2017_filtered.json"))
print(person_keypoints_val2017.keys())
dict_keys(['annotations', 'categories', 'images', 'info', 'licenses'])
pprint(person_keypoints_val2017["annotations"][0])
{'area': 17376.91885,
'bbox': [388.66, 69.92, 109.41, 277.62],
'category_id': 1,
'id': 200887,
'image_id': 397133,
'iscrowd': 0,
'keypoints': [433,
94,
2,
434,
90,
2,
0,
0,
0,
443,
98,
2,
0,
0,
0,
420,
128,
2,
474,
133,
2,
396,
162,
2,
489,
173,
2,
0,
0,
0,
0,
0,
0,
419,
214,
2,
458,
215,
2,
411,
274,
2,
458,
273,
2,
402,
333,
2,
465,
334,
2],
'num_keypoints': 13,
'segmentation': [[446.71,
70.66,
466.07,
72.89,
471.28,
78.85,
473.51,
88.52,
473.51,
98.2,
462.34,
111.6,
475.74,
126.48,
484.67,
136.16,
494.35,
157.74,
496.58,
174.12,
498.07,
182.31,
485.42,
189.75,
474.25,
189.01,
470.53,
202.4,
475.74,
337.12,
469.04,
347.54,
455.65,
343.08,
450.44,
323.72,
441.5,
255.99,
433.32,
250.04,
406.52,
340.1,
397.59,
344.56,
388.66,
330.42,
408.01,
182.31,
396.85,
186.77,
392.38,
177.84,
389.4,
166.68,
390.89,
147.32,
418.43,
119.04,
434.06,
111.6,
429.6,
98.94,
428.85,
81.08,
441.5,
72.89,
443.74,
69.92]]}
Kili project creation
Let's create the Kili project that will contain the images and annotations of the COCO dataset.
Below, we initialize the Kili client:
kili = Kili(
# api_endpoint="https://cloud.kili-technology.com/api/label/v2/graphql",
# the line above can be uncommented and changed if you are working with an on-premise version of Kili
)
The json_interface
variable contains the ontology of the project.
json_interface = {"jobs": {}}
We start by defining the transcription job, that aims describing the image content:
json_interface["jobs"]["TRANSCRIPTION_JOB"] = {
"content": {"input": "textField"},
"instruction": "Caption",
"mlTask": "TRANSCRIPTION",
"required": 0,
"isChild": False,
}
In the dictionary below, we map the category ids to the category names:
category_id_to_name = {
category["id"]: category["name"] for category in instances_val2017["categories"]
}
categories = {
category["name"]: {"children": [], "name": category["name"], "id": category["id"]}
for category in instances_val2017["categories"]
}
We also define object detection jobs:
json_interface["jobs"]["OBJECT_DETECTION_JOB"] = {
"content": {"categories": categories, "input": "radio"},
"instruction": "BBox",
"mlTask": "OBJECT_DETECTION",
"required": 0,
"tools": ["rectangle"],
"isChild": False,
}
json_interface["jobs"]["SEGMENTATION_JOB"] = {
"content": {"categories": categories, "input": "radio"},
"instruction": "Segment",
"mlTask": "OBJECT_DETECTION",
"required": 0,
"tools": ["semantic"],
"isChild": False,
}
And a pose estimation job:
map_key_cat_to_body_part = {
"nose": "face",
"left_eye": "face",
"right_eye": "face",
"left_ear": "face",
"right_ear": "face",
"left_shoulder": "upper_body_left",
"right_shoulder": "upper_body_right",
"left_elbow": "upper_body_left",
"right_elbow": "upper_body_right",
"left_wrist": "upper_body_left",
"right_wrist": "upper_body_right",
"left_hip": "lower_body_left",
"right_hip": "lower_body_right",
"left_knee": "lower_body_left",
"right_knee": "lower_body_right",
"left_ankle": "lower_body_left",
"right_ankle": "lower_body_right",
}
json_interface["jobs"]["POSE_ESTIMATION_JOB"] = {
"content": {
"categories": {
"face": {
"children": [],
"name": "face",
"points": [
{"code": "nose", "name": "nose"},
{"code": "left_eye", "name": "left_eye"},
{"code": "right_eye", "name": "right_eye"},
{"code": "left_ear", "name": "left_ear"},
{"code": "right_ear", "name": "right_ear"},
],
},
"upper_body_left": {
"children": [],
"name": "upper_body_left",
"points": [
{"code": "left_shoulder", "name": "left_shoulder"},
{"code": "left_elbow", "name": "left_elbow"},
{"code": "left_wrist", "name": "left_wrist"},
],
},
"upper_body_right": {
"children": [],
"name": "upper_body_right",
"points": [
{"code": "right_shoulder", "name": "right_shoulder"},
{"code": "right_elbow", "name": "right_elbow"},
{"code": "right_wrist", "name": "right_wrist"},
],
},
"lower_body_left": {
"children": [],
"name": "lower_body_left",
"points": [
{"code": "left_hip", "name": "left_hip"},
{"code": "left_knee", "name": "left_knee"},
{"code": "left_ankle", "name": "left_ankle"},
],
},
"lower_body_right": {
"children": [],
"name": "lower_body_right",
"points": [
{"code": "right_hip", "name": "right_hip"},
{"code": "right_knee", "name": "right_knee"},
{"code": "right_ankle", "name": "right_ankle"},
],
},
},
"input": "radio",
},
"instruction": "Pose estimation",
"mlTask": "OBJECT_DETECTION",
"required": 0,
"tools": ["pose"],
"isChild": False,
}
project = kili.create_project(
title="[Kili SDK Notebook]: COCO 2017",
input_type="IMAGE",
json_interface=json_interface,
)
Importing images
Now that our project is created, let's import the images:
content_array = []
external_id_array = []
for image in instances_val2017["images"]:
content_array.append(image["flickr_url"].replace("http://", "https://"))
external_id_array.append(str(image["id"]))
kili.append_many_to_dataset(
project["id"], content_array=content_array, external_id_array=external_id_array
)
Importing annotations
Below, we import some useful functions to convert annotations to Kili label format:
from typing import Dict, List
from kili.utils.labels.bbox import bbox_points_to_normalized_vertices, point_to_normalized_point
from kili.utils.labels.image import mask_to_normalized_vertices
def coco_bbox_annotation_to_normalized_vertices(
coco_ann: Dict, *, img_width: int, img_height: int
) -> List[Dict]:
x, y, width, height = coco_ann["bbox"]
ret = bbox_points_to_normalized_vertices(
bottom_left={"x": x, "y": y + height},
bottom_right={"x": x + width, "y": y + height},
top_right={"x": x + width, "y": y},
top_left={"x": x, "y": y},
img_height=img_height,
img_width=img_width,
origin_location="top_left",
)
return ret
def coco_segm_annotation_to_normalized_vertices(
coco_ann: Dict, *, img_width: int, img_height: int
) -> List[List[Dict]]:
coco_segmentations = coco_ann["segmentation"]
ret = []
for coco_segm in coco_segmentations:
if coco_ann["iscrowd"] == 0:
# a single object (iscrowd=0 in which case polygons are used)[
vertices = [
point_to_normalized_point(
point={"x": x, "y": y},
img_height=img_height,
img_width=img_width,
origin_location="top_left",
)
for x, y in zip(coco_segm[::2], coco_segm[1::2])
]
ret.append(vertices)
else:
# a crowd (iscrowd=1 in which case RLE (run-length encoding) is used)
rle_counts = coco_segmentations["counts"]
mask = np.zeros(img_height * img_width, dtype=np.uint8) # flat image
pixel_index = 0
for i, count in enumerate(rle_counts):
if i % 2 == 1:
# we set pixels' value
mask[pixel_index : pixel_index + count] = 255
pixel_index += count
# we reshape the mask to its original shape
# and we transpose it to have the same shape as the image
# (i.e. (height, width))
mask = mask.reshape((img_width, img_height)).T
# we convert the mask to normalized vertices
# hierarchy is not used here. It is used for polygons with holes.
normalized_vertices, hierarchy = mask_to_normalized_vertices(mask)
ret.extend(normalized_vertices)
return ret
json_response_array = []
for image_id in external_id_array:
img_info = [img for img in instances_val2017["images"] if img["id"] == int(image_id)][0]
img_width = img_info["width"]
img_height = img_info["height"]
# json response contains the label data for the image
json_resp = {}
### Transcription job
img_captions = [
ann for ann in captions_val2017["annotations"] if ann["image_id"] == int(image_id)
]
json_resp["TRANSCRIPTION_JOB"] = {
"text": img_captions[0]["caption"] # we only take the 1st caption for sake of simplicity
}
### Object detection and segmentation annotations
coco_annotations = [
ann for ann in instances_val2017["annotations"] if ann["image_id"] == int(image_id)
]
kili_bbox_annotations = []
kili_segm_annotations = []
for coco_ann in coco_annotations:
### Object detection job
if coco_ann["iscrowd"] == 0:
# we skip crowd annotations bbox since they tend to be very large
kili_bbox_ann = {
"children": {},
"boundingPoly": [
{
"normalizedVertices": coco_bbox_annotation_to_normalized_vertices(
coco_ann, img_width=img_width, img_height=img_height
)
}
],
"categories": [{"name": category_id_to_name[coco_ann["category_id"]]}],
"type": "rectangle",
"mid": str(coco_ann["id"]) + "_bbox",
}
kili_bbox_annotations.append(kili_bbox_ann)
### Segmentation job
for i, norm_vertices in enumerate(
coco_segm_annotation_to_normalized_vertices(
coco_ann, img_width=img_width, img_height=img_height
)
):
kili_segm_ann = {
"children": {},
"boundingPoly": [{"normalizedVertices": norm_vertices}],
"categories": [{"name": category_id_to_name[coco_ann["category_id"]]}],
"type": "semantic",
"mid": str(coco_ann["id"]) + "_segm_" + str(i),
}
kili_segm_annotations.append(kili_segm_ann)
### Pose estimation annotations
coco_annotations = [
ann for ann in person_keypoints_val2017["annotations"] if ann["image_id"] == int(image_id)
]
kili_keypoints_annotations = []
for coco_ann in coco_annotations:
keypoints = coco_ann["keypoints"]
for body_part in (
"face",
"upper_body_left",
"upper_body_right",
"lower_body_left",
"lower_body_right",
):
kili_keypoint_ann = {
"categories": [{"name": body_part}],
"children": {},
"jobName": "POSE_ESTIMATION_JOB",
"kind": "POSE_ESTIMATION",
"mid": str(coco_ann["id"]) + "_keypoints_" + body_part,
"points": [],
"type": "pose",
}
for x, y, visibility, point_type in zip(
keypoints[::3],
keypoints[1::3],
keypoints[2::3],
person_keypoints_val2017["categories"][0]["keypoints"],
):
if x == y == visibility == 0:
continue
if map_key_cat_to_body_part[point_type] != body_part:
continue
kili_keypoint_ann["points"].append(
{
"children": {},
"code": point_type,
"jobName": "POSE_ESTIMATION_JOB",
"mid": str(coco_ann["id"]) + "_keypoints_" + point_type,
"name": point_type,
"type": "marker",
"point": point_to_normalized_point(
point={"x": x, "y": y},
img_height=img_height,
img_width=img_width,
origin_location="top_left",
),
}
)
kili_keypoints_annotations.append(kili_keypoint_ann)
json_resp["OBJECT_DETECTION_JOB"] = {"annotations": kili_bbox_annotations}
json_resp["SEGMENTATION_JOB"] = {"annotations": kili_segm_annotations}
json_resp["POSE_ESTIMATION_JOB"] = {"annotations": kili_keypoints_annotations}
json_response_array.append(json_resp)
kili.append_labels(
asset_external_id_array=external_id_array,
project_id=project["id"],
json_response_array=json_response_array,
)
In Kili labeling interface, we can see the images and the annotations: