Skip to content

Open In Colab

How to export data from a Kili project

Outline

This tutorial explains the multiple ways to export a Kili project. It describes:

  • Methods to export the labels one by one, after filtering
  • The solutions for performing a full-project export

The methods are illustrated with code snippets.

Export methods

With Kili, once you have annotated enough assets, you can export the data programmatically to train a machine learning algorithm with it. There are several ways to do it:

  • Fetch the assets and/or the labels one by one using .assets or .labels, perform the data transformation yourself and then write the data to one or several output files.
  • Export the whole project as a dataset. To do that, use the .export_labels method that creates an archive containing the labels in your chosen format.

Preliminary steps

1) Fetch the project ID from the Kili UI (in Settings / Admin):

image.png

2) Ensure that your Kili API key has been set as an environment variable:

export KILI_API_KEY=<YOUR_API_KEY>

3) If Kili has not been installed yet, install Kili.

%pip install  kili

4) Import packages and instantiate Kili:

from pathlib import Path

from kili.client import Kili

kili = Kili()

Exporting assets and labels one by one

To retrieve all assets of a project one by one, perform the following steps:

Exporting the latest labels per asset

First, fetch the assets:

assets = kili.assets(
    your_project_id,
    fields=["externalId", "latestLabel.jsonResponse"],
    label_output_format="parsed_label",
)

Now if you print an asset, you will see that you can access its latestLabel:

print(assets[0])
{'latestLabel': {'jsonResponse': {'JOB_0': {'annotations': [{'categories': [{'name': 'OBJECT_A'}], 'mid': '20230111125258113-44528', 'type': 'rectangle', 'boundingPoly': [{'normalizedVertices': [{'x': 0.6101435505380516, 'y': 0.7689773770786136}, {'x': 0.6101435505380516, 'y': 0.39426226491370664}, {'x': 0.8962087421313937, 'y': 0.39426226491370664}, {'x': 0.8962087421313937, 'y': 0.7689773770786136}]}], 'polyline': [], 'children': {}}]}}}, 'externalId': 'car_1'}

You can now get your label, and write the category name into a text file for example:

for asset in assets:
    if asset["latestLabel"]:  # check if asset has annotations
        class_ = asset["latestLabel"].jobs["JOB_0"].annotations[0].category.name
        with Path(asset["externalId"] + ".txt").open("w", encoding="utf-8") as f:
            f.write(class_)

Filtering specific labels per asset through the method filters

You can specify label filters directly in the .assets and the .labels methods. The available filters are listed in the arguments list for each one of these methods.

When done, you can write the conversion code to get the data in the format that you need.

Get only the assets with a consensus mark above 0.5:

assets = kili.assets(
    your_project_id, fields=["externalId", "id", "consensusMark"], consensus_mark_gt=0.5
)
print(assets)
# + asset conversion code
[{'externalId': 'car_1', 'id': 'clcyuykzd0000bgvze2z3wk81', 'consensusMark': 0.6504290982818591}]

Get all the labels with a honeypot mark above 0.1:

labels = kili.labels(
    your_project_id,
    fields=["labelOf.externalId", "honeypotMark", "author.email", "id"],
    honeypot_mark_gte=0.1,
)
print(labels)
# + label conversion code
[{'labelOf': {'externalId': 'car_1'}, 'author': {'email': 'john.doe@kili-technology.com'}, 'honeypotMark': 0.16527040499137607, 'id': 'clcyuynri2fnl0krf0d7pgabo'}, {'labelOf': {'externalId': 'car_1'}, 'author': {'email': 'john.smith@kili-technology.com'}, 'honeypotMark': 0.20754115450190522, 'id': 'clcyuynri2fnm0krfhx934jee'}]

Get all the labels added by a specific project member:

labels = kili.labels(
    your_project_id, fields=["labelOf.externalId", "author.email", "id"], user_id=john_doe_id
)
print(labels)
# + label conversion code
[{'labelOf': {'externalId': 'car_1'}, 'author': {'email': 'john.doe@kili-technology.com'}, 'id': 'clcyuynri2fnl0krf0d7pgabo'}]

This code will return a list of labels authored by John Doe.

You can also use the author_in parameter to filter by name directly.

Filtering specific labels per asset through the label properties

You can also look for specific labels, for example the last "review" status label per user, and dump the result into a json file. You can use the field "labels.isLatestReviewLabelForUser" to check if the label is the latest per user.

import json

assets = kili.assets(
    your_project_id,
    fields=["externalId", "labels.jsonResponse", "labels.isLatestReviewLabelForUser"],
)

for asset in assets:
    if asset["labels"]:  # check if asset has annotations
        for label in asset["labels"]:
            if label["isLatestReviewLabelForUser"] and "JOB_0" in label["jsonResponse"]:
                annotation = label["jsonResponse"]["JOB_0"]
                with Path(asset["externalId"] + ".json").open("w", encoding="utf-8") as f:
                    f.write(json.dumps(annotation))
                break  # once we find a latest label done by a reviewer, we move on to the next asset.

Filtering the latest label per annotator

When working on a project with consensus enabled, it can be useful to export the latest label made by each annotator:

from collections import defaultdict

assets = kili.assets(
    "clb54wfkn01zb0kyadscgaf5j",
    fields=[
        "externalId",
        "labels.author.email",
        "labels.createdAt",
        "labels.labelType",
        "labels.jsonResponse",
    ],
)

for asset in assets:
    if asset["labels"]:
        latest_label_by_user = defaultdict(list)
        for label in asset["labels"]:
            if label["labelType"] == "DEFAULT":
                latest_label_by_user[label["author"]["email"]].append(label)
        latest_label_per_user = {
            email: max(labels, key=lambda x: x["createdAt"])
            for email, labels in latest_label_by_user.items()
        }
        with (Path("/tmp") / (asset["externalId"] + ".json")).open("w", encoding="utf-8") as f:
            f.write(json.dumps(latest_label_per_user))

Exporting a whole project

You can export your project data from the Kili UI (see documentation), but Kili SDK also enables you to export your labels and assets into several export formats.

Available formats

Format UI Python Client Command Line Interface
Kili (raw)
YOLO V4
YOLO V5
YOLO V7
YOLO V8
Pascal VOC
COCO
GeoJSON

The .export_labels method

The .export_labels method enables the export of a full project. It does the following preprocessing:

  • Only fetches the labels of types "DEFAULT" and "REVIEW" (see the label types explanations).
  • If specified, selects a subset of asset ids.
  • Exports labels to one of the standard formats (only available for a restricted set of ML tasks).
  • Using various method arguments, you can decide:
    • Whether or not to include the assets in the export
    • Whether to export just the latest label or all the labels
    • Whether to create one folder for all the jobs or one folder per job
    • Whether or not to export the label-related data into one single file

Note that some formats are by default single-file, while others use many files:

Format Single file Multiple files
Kili
Yolo
Pascal VOC
COCO

For all the formats, in the output archive, a README.kili.txt file is also created. Here is an example of its contents:

Exported Labels from KILI
=========================

- Project name: Awesome annotation project
- Project identifier: abcdefghijklmnop
- Project description: This project contains labels, most of which are awesome.
- Export date: 20221125-093324
- Exported format: kili
- Exported labels: latest

Kili format, one file per asset

The following code snippet exports the whole asset payload and the associated labels, with one json file per asset, into the /tmp/export.zip folder.

kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="kili",
)
Fetching assets...
/tmp/export.zip

Kili format, one file for the whole project

This code snippet exports the whole asset payload and the associated labels as one file for the whole project, into the /tmp/export.zip folder.

kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="kili",
    single_file=True,
)
Fetching assets...
/tmp/export.zip

YOLO formats

When you have at least one object etection job, you can also export to one of the following YOLO formats: "yolo_v4", "yolo_v5", "yolo_v7" or "yolo_v8". The difference between each format is the structure of the metadata YAML file, which specifies the object classes. In all the cases, one file per asset is produced, containing the last created DEFAULT or REVIEW label.

For bouding boxes, each YOLO label has the following shape:

2        0.25 0.67 0.26 0.34
^        ^    ^    ^    ^
class    x    y    w    h

where:

  • class is the class index in the classes list contained in the YOLO metadata file.
  • x is the x-coordinate relative to the image width (between 0.0 and 1.0) of the center of the bounding box.
  • y is the y-coordinate relative to the image height (between 0.0 and 1.0) of the center of the bounding box.
  • w is the width relative to the image width (between 0.0 and 1.0) of the bounding box.
  • h is the height relative to the image height (between 0.0 and 1.0) of the bounding box.

For polygons or segmentations, each YOLO label has the following shape:

2        0.25 0.67 0.26 0.34 0.4 0.5 0.6 0.7  ...
^        ^    ^    ^    ^    ^   ^   ^   ^
class    x1   y1   x2   y2   x3  y3  x4  y4   ...

where:

  • class is the class index in the classes list contained in the YOLO metadata file.
  • xi is the x-coordinate relative to the image width (between 0.0 and 1.0) of the i-th point of the polygon.
  • yi is the y-coordinate relative to the image height (between 0.0 and 1.0) of the i-th point of the polygon.

Here is an example of a YOLO annotation over an image:

image.png

Here is how to export to YOLO (in this example, YOLOv5):

kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="yolo_v5",
)
Fetching assets...
/tmp/export.zip

Note that a standard YOLO file format must also include:

  • The path root to the assets
  • The train, val and test subfolders

Placing specific data in specific folders is the decision of an ML engineer or a Data scientist, so we are not providing a code snippet here.

COCO format

To export your data into the COCO format, run the following code:

kili.export_labels(
    project_id=your_project_id,
    filename="/tmp/export.zip",
    fmt="coco",
)
Fetching assets...
Convert to coco format: 1it [00:00, 54.94it/s]
/tmp/export.zip

This will create an archive containing both:

  • The COCO annotation file
  • The data/ folder with all the assets

Cleanup

We can remove the project that we created:

kili.delete_project(your_project_id)

Summary

In this tutorial, we have seen several ways to export labels from a Kili project:

  • Using .assets and .labels and their filtering arguments, a subset of assets or labels can be selected and then exported.
  • Using .export_labels, the whole project can be exported into a standard output format.