How to export data from a Kili project
Outline
This tutorial explains the multiple ways to export a Kili project. It describes:
- Methods to export the labels one by one, after filtering
- The solutions for performing a full-project export
The methods are illustrated with code snippets.
Export methods
With Kili, once you have annotated enough assets, you can export the data programmatically to train a machine learning algorithm with it. There are several ways to do it:
- Fetch the assets and/or the labels one by one using
.assets
or.labels
, perform the data transformation yourself and then write the data to one or several output files. - Export the whole project as a dataset. To do that, use the
.export_labels
method that creates an archive containing the labels in your chosen format.
Preliminary steps
1) Fetch the project ID from the Kili UI (in Settings / Admin):
2) Ensure that your Kili API key has been set as an environment variable:
export KILI_API_KEY=<YOUR_API_KEY>
3) If Kili has not been installed yet, install Kili.
%pip install kili
4) Import packages and instantiate Kili
:
from pathlib import Path
from kili.client import Kili
kili = Kili()
Exporting assets and labels one by one
To retrieve all assets of a project one by one, perform the following steps:
Exporting the latest labels per asset
First, fetch the assets:
assets = kili.assets(
your_project_id,
fields=["externalId", "latestLabel.jsonResponse"],
label_output_format="parsed_label",
)
Now if you print an asset, you will see that you can access its latestLabel
:
print(assets[0])
{'latestLabel': {'jsonResponse': {'JOB_0': {'annotations': [{'categories': [{'name': 'OBJECT_A'}], 'mid': '20230111125258113-44528', 'type': 'rectangle', 'boundingPoly': [{'normalizedVertices': [{'x': 0.6101435505380516, 'y': 0.7689773770786136}, {'x': 0.6101435505380516, 'y': 0.39426226491370664}, {'x': 0.8962087421313937, 'y': 0.39426226491370664}, {'x': 0.8962087421313937, 'y': 0.7689773770786136}]}], 'polyline': [], 'children': {}}]}}}, 'externalId': 'car_1'}
You can now get your label, and write the category name into a text file for example:
for asset in assets:
if asset["latestLabel"]: # check if asset has annotations
class_ = asset["latestLabel"].jobs["JOB_0"].annotations[0].category.name
with Path(asset["externalId"] + ".txt").open("w", encoding="utf-8") as f:
f.write(class_)
Filtering specific labels per asset through the method filters
You can specify label filters directly in the .assets
and the .labels
methods. The available filters are listed in the arguments list
for each one of these methods.
When done, you can write the conversion code to get the data in the format that you need.
Get only the assets with a consensus mark above 0.5:
assets = kili.assets(
your_project_id, fields=["externalId", "id", "consensusMark"], consensus_mark_gt=0.5
)
print(assets)
# + asset conversion code
[{'externalId': 'car_1', 'id': 'clcyuykzd0000bgvze2z3wk81', 'consensusMark': 0.6504290982818591}]
Get all the labels with a honeypot mark above 0.1:
labels = kili.labels(
your_project_id,
fields=["labelOf.externalId", "honeypotMark", "author.email", "id"],
honeypot_mark_gte=0.1,
)
print(labels)
# + label conversion code
[{'labelOf': {'externalId': 'car_1'}, 'author': {'email': 'john.doe@kili-technology.com'}, 'honeypotMark': 0.16527040499137607, 'id': 'clcyuynri2fnl0krf0d7pgabo'}, {'labelOf': {'externalId': 'car_1'}, 'author': {'email': 'john.smith@kili-technology.com'}, 'honeypotMark': 0.20754115450190522, 'id': 'clcyuynri2fnm0krfhx934jee'}]
Get all the labels added by a specific project member:
labels = kili.labels(
your_project_id, fields=["labelOf.externalId", "author.email", "id"], user_id=john_doe_id
)
print(labels)
# + label conversion code
[{'labelOf': {'externalId': 'car_1'}, 'author': {'email': 'john.doe@kili-technology.com'}, 'id': 'clcyuynri2fnl0krf0d7pgabo'}]
This code will return a list of labels authored by John Doe.
You can also use the author_in
parameter to filter by name directly.
Filtering specific labels per asset through the label properties
You can also look for specific labels, for example the last "review" status label per user, and dump the result into a json file. You can use the field "labels.isLatestReviewLabelForUser"
to check if the label is the latest per user.
import json
assets = kili.assets(
your_project_id,
fields=["externalId", "labels.jsonResponse", "labels.isLatestReviewLabelForUser"],
)
for asset in assets:
if asset["labels"]: # check if asset has annotations
for label in asset["labels"]:
if label["isLatestReviewLabelForUser"] and "JOB_0" in label["jsonResponse"]:
annotation = label["jsonResponse"]["JOB_0"]
with Path(asset["externalId"] + ".json").open("w", encoding="utf-8") as f:
f.write(json.dumps(annotation))
break # once we find a latest label done by a reviewer, we move on to the next asset.
Filtering the latest label per annotator
When working on a project with consensus enabled, it can be useful to export the latest label made by each annotator:
from collections import defaultdict
assets = kili.assets(
"clb54wfkn01zb0kyadscgaf5j",
fields=[
"externalId",
"labels.author.email",
"labels.createdAt",
"labels.labelType",
"labels.jsonResponse",
],
)
for asset in assets:
if asset["labels"]:
latest_label_by_user = defaultdict(list)
for label in asset["labels"]:
if label["labelType"] == "DEFAULT":
latest_label_by_user[label["author"]["email"]].append(label)
latest_label_per_user = {
email: max(labels, key=lambda x: x["createdAt"])
for email, labels in latest_label_by_user.items()
}
with (Path("/tmp") / (asset["externalId"] + ".json")).open("w", encoding="utf-8") as f:
f.write(json.dumps(latest_label_per_user))
Exporting a whole project
You can export your project data from the Kili UI (see documentation), but Kili SDK also enables you to export your labels and assets into several export formats.
Available formats
Format | UI | Python Client | Command Line Interface |
---|---|---|---|
Kili (raw) | ✅ | ✅ | ✅ |
YOLO V4 | ✅ | ✅ | ✅ |
YOLO V5 | ✅ | ✅ | ✅ |
YOLO V7 | ❌ | ✅ | ✅ |
Pascal VOC | ✅ | ✅ | ✅ |
COCO | ❌ | ✅ | ✅ |
The .export_labels
method
The .export_labels
method enables the export of a full project. It does the following preprocessing:
- Only fetches the labels of types
"DEFAULT"
and"REVIEW"
(see the label types explanations). - If specified, selects a subset of asset ids.
- Exports labels to one of the standard formats (only available for a restricted set of ML tasks).
- Using various method arguments, you can decide:
- Whether or not to include the assets in the export
- Whether to export just the latest label or all the labels
- Whether to create one folder for all the jobs or one folder per job
- Whether or not to export the label-related data into one single file
Note that some formats are by default single-file, while others use many files:
Format | Single file | Multiple files |
---|---|---|
Kili | ✅ | ✅ |
Yolo | ❌ | ✅ |
Pascal VOC | ❌ | ✅ |
COCO | ✅ | ❌ |
For all the formats, in the output archive, a README.kili.txt
file is also created. Here is an example of its contents:
Exported Labels from KILI
=========================
- Project name: Awesome annotation project
- Project identifier: abcdefghijklmnop
- Project description: This project contains labels, most of which are awesome.
- Export date: 20221125-093324
- Exported format: kili
- Exported labels: latest
Kili format, one file per asset
The following code snippet exports the whole asset payload and the associated labels, with one json file per asset, into the /tmp/export.zip
folder.
kili.export_labels(
project_id=your_project_id,
filename="/tmp/export.zip",
fmt="kili",
)
Fetching assets...
/tmp/export.zip
Kili format, one file for the whole project
This code snippet exports the whole asset payload and the associated labels as one file for the whole project, into the /tmp/export.zip
folder.
kili.export_labels(
project_id=your_project_id,
filename="/tmp/export.zip",
fmt="kili",
single_file=True,
)
Fetching assets...
/tmp/export.zip
YOLO formats
When you have at least one Object Detection job with bounding boxes, you can also export to one of the YOLO formats. You can choose "yolo_v4"
, "yolo_v5"
or "yolo_v7"
. The difference between each format is the structure of the metadata YAML file, which specifies the object classes. In all the cases, one file per asset is produced, containing the last created DEFAULT
or REVIEW
label. Each YOLO label has the following shape:
2 0.25 0.67 0.26 0.34
^ ^ ^ ^ ^
class x y w h
class
is the class index in the classes list contained in the YOLO metadata file.x
is the x-coordinate relative to the image width (between 0.0 and 1.0) of the center of the bounding box.y
is the y-coordinate relative to the image height (between 0.0 and 1.0) of the center of the bounding box.w
is the width relative to the image width (between 0.0 and 1.0) of the bounding box.h
is the height relative to the image height (between 0.0 and 1.0) of the bounding box.
Here is an example of a YOLO annotation over an image:
Here is how to export to YOLO (in this example, YOLOv5):
kili.export_labels(
project_id=your_project_id,
filename="/tmp/export.zip",
fmt="yolo_v5",
)
Fetching assets...
/tmp/export.zip
Note that a standard YOLO file format must also include:
- The path root to the assets
- The
train
,val
andtest
subfolders
Placing specific data in specific folders is the decision of an ML engineer or a Data scientist, so we are not providing a code snippet here.
COCO format
To export your data into the COCO format, run the following code:
kili.export_labels(
project_id=your_project_id,
filename="/tmp/export.zip",
fmt="coco",
)
Fetching assets...
Convert to coco format: 1it [00:00, 54.94it/s]
/tmp/export.zip
This will create an archive containing both:
- The COCO annotation file
- The
data/
folder with all the assets
Cleanup
We can remove the project that we created:
kili.delete_project(your_project_id)
Summary
In this tutorial, we have seen several ways to export labels from a Kili project:
- Using
.assets
and.labels
and their filtering arguments, a subset of assets or labels can be selected and then exported. - Using
.export_labels
, the whole project can be exported into a standard output format.