Skip to content

Label parsing module

The module kili.utils.labels.parsing provides a ParsedLabel class that is used to parse labels.

Using labels as ParsedLabel instances is recommended when manipulating the label data, as it will provide autocompletion to access the meaningful fields of the label. If you prefer not to use it, you can still access the labeling data through the label dictionaries.

Read more about this feature in the label parsing tutorial.

Warning

This feature is currently in beta. The classes and methods can still change marginally.

ParsedLabel

Class that represents a parsed label.

Source code in kili/utils/labels/parsing.py
class ParsedLabel(Dict):
    """Class that represents a parsed label."""

    def __init__(self, label: Dict, json_interface: Dict, input_type: InputType) -> None:
        # pylint: disable=line-too-long
        """Class that represents a parsed label.

        The class behaves like a dict but adds the attribute `.jobs`.

        The original input label passed to this class is not modified.

        Args:
            label: Label to parse.
            json_interface: Json interface of the project.
            input_type: Type of assets of the project.

        !!! Example
            ```python
            from kili.utils.labels.parsing import ParsedLabel

            my_label = kili.labels("project_id")[0]  # my_label is a dict

            my_parsed_label = ParsedLabel(my_label, json_interface, input_type)  # ParsedLabel object

            # Access the job "JOB_0" data through the attribute ".jobs":
            print(my_parsed_label.jobs["JOB_0"])
            ```

        !!! info
            More information about the label parsing can be found in this [tutorial](https://python-sdk-docs.kili-technology.com/latest/sdk/tutorials/label_parsing/).
        """
        label_copy = deepcopy(label)
        json_response = label_copy.pop("jsonResponse", {})

        super().__init__(label_copy)

        project_info = Project(inputType=input_type, jsonInterface=json_interface["jobs"])

        self.jobs = json_response_module.ParsedJobs(
            project_info=project_info, json_response=json_response
        )

    def to_dict(self) -> Dict:
        """Return a copy of the parsed label as a dict.

        !!! Example
            ```python
            my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)

            # Convert back to native Python dictionary
            my_label_as_dict = label.to_dict()

            assert isinstance(my_label_as_dict, dict)  # True
            ```
        """
        ret = {k: deepcopy(v) for k, v in self.items() if k != "jsonResponse"}
        ret["jsonResponse"] = self.json_response
        return ret

    def __repr__(self) -> str:
        """Return the representation of the object."""
        return repr(self.to_dict())

    def __str__(self) -> str:
        """Return the string representation of the object."""
        return str(self.to_dict())

    @property
    def json_response(self) -> Dict:
        """Returns a copy of the json response of the parsed label."""
        return self.jobs.to_dict()

__init__(self, label, json_interface, input_type) special

Class that represents a parsed label.

The class behaves like a dict but adds the attribute .jobs.

The original input label passed to this class is not modified.

Parameters:

Name Type Description Default
label Dict

Label to parse.

required
json_interface Dict

Json interface of the project.

required
input_type Literal['IMAGE', 'PDF', 'TEXT', 'VIDEO', 'LLM_RLHF', 'LLM_INSTR_FOLLOWING']

Type of assets of the project.

required

Example

from kili.utils.labels.parsing import ParsedLabel

my_label = kili.labels("project_id")[0]  # my_label is a dict

my_parsed_label = ParsedLabel(my_label, json_interface, input_type)  # ParsedLabel object

# Access the job "JOB_0" data through the attribute ".jobs":
print(my_parsed_label.jobs["JOB_0"])

Info

More information about the label parsing can be found in this tutorial.

Source code in kili/utils/labels/parsing.py
def __init__(self, label: Dict, json_interface: Dict, input_type: InputType) -> None:
    # pylint: disable=line-too-long
    """Class that represents a parsed label.

    The class behaves like a dict but adds the attribute `.jobs`.

    The original input label passed to this class is not modified.

    Args:
        label: Label to parse.
        json_interface: Json interface of the project.
        input_type: Type of assets of the project.

    !!! Example
        ```python
        from kili.utils.labels.parsing import ParsedLabel

        my_label = kili.labels("project_id")[0]  # my_label is a dict

        my_parsed_label = ParsedLabel(my_label, json_interface, input_type)  # ParsedLabel object

        # Access the job "JOB_0" data through the attribute ".jobs":
        print(my_parsed_label.jobs["JOB_0"])
        ```

    !!! info
        More information about the label parsing can be found in this [tutorial](https://python-sdk-docs.kili-technology.com/latest/sdk/tutorials/label_parsing/).
    """
    label_copy = deepcopy(label)
    json_response = label_copy.pop("jsonResponse", {})

    super().__init__(label_copy)

    project_info = Project(inputType=input_type, jsonInterface=json_interface["jobs"])

    self.jobs = json_response_module.ParsedJobs(
        project_info=project_info, json_response=json_response
    )

to_dict(self)

Return a copy of the parsed label as a dict.

Example

my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)

# Convert back to native Python dictionary
my_label_as_dict = label.to_dict()

assert isinstance(my_label_as_dict, dict)  # True
Source code in kili/utils/labels/parsing.py
def to_dict(self) -> Dict:
    """Return a copy of the parsed label as a dict.

    !!! Example
        ```python
        my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)

        # Convert back to native Python dictionary
        my_label_as_dict = label.to_dict()

        assert isinstance(my_label_as_dict, dict)  # True
        ```
    """
    ret = {k: deepcopy(v) for k, v in self.items() if k != "jsonResponse"}
    ret["jsonResponse"] = self.json_response
    return ret

Task specific attributes and methods

Classification tasks

For classification tasks, the following attributes are available:

.categories

Returns a CategoryList object that contains the categories of an asset.

label.jobs["CLASSIF_JOB"].categories

.category

Returns a Category object that contains the category of an asset.

Only available if the classification job is a one-class classification job.

label.jobs["CLASSIF_JOB"].category
# Same as:
label.jobs["CLASSIF_JOB"].categories[0]

.name

Retrieves the category name.

label.jobs["CLASSIF_JOB"].category.name

Example

json_interface = {
    "jobs": {
        "JOB_0": {
            "mlTask": "CLASSIFICATION",
            "content": {
                "categories": {
                    "CATEGORY_A": {"name": "A"},
                    "CATEGORY_B": {"name": "B"},
                },
                "input": "radio",
            },
        }
    }
}
json_response_dict = {
    "JOB_0": {
        "categories": [
            {
                "confidence": 100,
                "name": "CATEGORY_A",
            }
        ]
    }
}
my_label = {"jsonResponse": json_response_dict}

parsed_label = ParsedLabel(label=my_label, json_interface=json_interface, input_type="IMAGE")

print(parsed_label.jobs["JOB_0"].categories[0].name)  # CATEGORY_A
print(parsed_label.jobs["JOB_0"].categories[0].display_name)  # A

.display_name

Retrieves the category name as it is displayed in the interface.

label.jobs["CLASSIF_JOB"].category.display_name

.confidence

Retrieves the confidence (when available).

label.jobs["CLASSIF_JOB"].category.confidence

Transcription tasks

.text

Retrieves the transcription text.

label.jobs["TRANSCRIPTION_JOB"].text

Object detection tasks

For more information about the different object detection tasks and their label formats, please refer to the Kili documentation.

Standard object detection

.bounding_poly

Returns a list of bounding polygons for an annotation.

label.jobs["DETECTION_JOB"].annotations[0].bounding_poly
.normalized_vertices

Returns a list of normalized vertices for a bounding polygon.

label.jobs["DETECTION_JOB"].annotations[0].bounding_poly[0].normalized_vertices
.bounding_poly_annotations

This attribute is an alias for .annotations.

The benefit of using this attribute is that it will only show in your IDE autocompletions the attributes that are relevant for the object detection task.

# the .content attribute is not relevant for object detection tasks!

# IDE autocompletion will accept this attribute, but will crash at runtime
label.jobs["BBOX_JOB"].annotations.content

# IDE autocompletion will not display this attribute and Python linter will raise an error
label.jobs["BBOX_JOB"].bounding_poly_annotations.content

Point detection

.point

Returns the x and y coordinates of the point.

label.jobs["POINT_JOB"].annotations[0].point

Line detection

.polyline

Returns the list of points for a line annotation.

label.jobs["LINE_JOB"].annotations[0].polyline

Pose estimation

.points

Returns the list of points for an annotation.

label.jobs["POSE_JOB"].annotations[0].points
.point

Returns the point data.

label.jobs["POSE_JOB"].annotations[0].points[0].point
.point.point

Returns a dictionary with the coordinates of the point.

label.jobs["POSE_JOB"].annotations[0].points[0].point.point
.code

Returns the point identifier (unique for each point in an object).

label.jobs["POSE_JOB"].annotations[0].points[0].point.code
.name

Returns the point name.

label.jobs["POSE_JOB"].annotations[0].points[0].point.name
.job_name

Returns the job which annotated point belongs to.

label.jobs["POSE_JOB"].annotations[0].points[0].point.job_name

Video tasks

.frames

Returns a list of parsed label data for a each frame.

label.jobs["FRAME_CLASSIF_JOB"].frames
label.jobs["FRAME_CLASSIF_JOB"].frames[5]  # 6th frame

# get category name of the 6th frame (for a frame classification job only)
label.jobs["FRAME_CLASSIF_JOB"].frames[5].category.name

Named entities recognition tasks

.content

Returns the content of the mention.

label.jobs["NER_JOB"].annotations[0].content

.begin_offset

Returns the position of the first character of the mention in the text.

label.jobs["NER_JOB"].annotations[0].begin_offset

.end_offet

When available, returns the position of the last character of the mention in the text.

label.jobs["NER_JOB"].annotations[0].end_offset

.entity_annotations

This attribute is an alias for .annotations.

The benefit of using this attribute is that it will only show in your IDE autocompletions the attributes that are relevant for the NER task.

# the .points attribute is not relevant for NER tasks, it is only used for pose estimation tasks!

# IDE autocompletion will accept this attribute, but will crash at runtime
label.jobs["NER_JOB"].annotations.points

# IDE autocompletion will not display this attribute and Python linter will raise an error
label.jobs["NER_JOB"].entity_annotations.points

Named entities recognition in PDFs tasks

.content

Returns the content of the mention.

label.jobs["NER_PDF_JOB"].annotations[0].content

.annotations

NER in PDFs annotations have an additional layer of annotations. See the documentation for more information.

.polys

Returns a list of dictionaries containing the normalized vertices of the mention.

label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].polys

.page_number_array

label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].page_number_array

.bounding_poly

Returns a list of dictionaries containing the normalized vertices of the mention.

label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].bounding_poly

Relation tasks

Named entities relation

.start_entities

Returns a list of dictionaries containing the start entities Ids of the relation.

label.jobs["NER_RELATION_JOB"].annotations[0].start_entities
.end_entities

Returns a list of dictionaries containing the end entities Ids of the relation.

label.jobs["NER_RELATION_JOB"].annotations[0].end_entities

Object detection relation

.start_objects

Returns a list of dictionaries containing the start objects Ids of the relation.

label.jobs["OBJECT_RELATION_JOB"].annotations[0].start_objects
.end_objects

Returns a list of dictionaries containing the end objects Ids of the relation.

label.jobs["OBJECT_RELATION_JOB"].annotations[0].end_objects

Children tasks

.children

Depending on the task, the .children attribute can be found in different places:

# For cassification task
label.jobs["CLASSIF_JOB"].category.children

# For several kinds of tasks: object detection, NER, pose estimation, etc.
label.jobs["OBJECT_DETECTION_JOB"].annotations[0].children

You can find more information about the children jobs in the label parsing tutorial.

Migrating from jsonReponse format

In most cases, the attributes of a parsed label are the snake case version of the keys present in the json response.

For example, with a NER (named entities recognition) label, you can access the beginOffset data of an annotation with parsed_label.jobs["NER_JOB"].annotations[0].begin_offset.

The different json response keys are listed in the Kili documentation: