Label parsing module

The module kili.utils.labels.parsing provides a ParsedLabel class that is used to parse labels.

Using labels as ParsedLabel instances is recommended when manipulating the label data, as it will provide autocompletion to access the meaningful fields of the label. If you prefer not to use it, you can still access the labeling data through the label dictionaries.

Read more about this feature in the label parsing tutorial.

Warning

This feature is currently in beta. The classes and methods can still change marginally.

ParsedLabel

Class that represents a parsed label.

Source code in kili/utils/labels/parsing.py

class ParsedLabel(Dict):
    """Class that represents a parsed label."""

    def __init__(self, label: Dict, json_interface: Dict, input_type: InputType) -> None:
        # pylint: disable=line-too-long
        """Class that represents a parsed label.

        The class behaves like a dict but adds the attribute `.jobs`.

        The original input label passed to this class is not modified.

        Args:
            label: Label to parse.
            json_interface: Json interface of the project.
            input_type: Type of assets of the project.

        !!! Example
            ```python
            from kili.utils.labels.parsing import ParsedLabel

            my_label = kili.labels("project_id")[0]  # my_label is a dict

            my_parsed_label = ParsedLabel(my_label, json_interface, input_type)  # ParsedLabel object

            # Access the job "JOB_0" data through the attribute ".jobs":
            print(my_parsed_label.jobs["JOB_0"])
            ```

        !!! info
            More information about the label parsing can be found in this [tutorial](https://python-sdk-docs.kili-technology.com/latest/sdk/tutorials/label_parsing/).
        """
        label_copy = deepcopy(label)
        json_response = label_copy.pop("jsonResponse", {})

        super().__init__(label_copy)

        project_info = Project(inputType=input_type, jsonInterface=json_interface["jobs"])

        self.jobs = json_response_module.ParsedJobs(
            project_info=project_info, json_response=json_response
        )

    def to_dict(self) -> Dict:
        """Return a copy of the parsed label as a dict.

        !!! Example
            ```python
            my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)

            # Convert back to native Python dictionary
            my_label_as_dict = label.to_dict()

            assert isinstance(my_label_as_dict, dict)  # True
            ```
        """
        ret = {k: deepcopy(v) for k, v in self.items() if k != "jsonResponse"}
        ret["jsonResponse"] = self.json_response
        return ret

    def __repr__(self) -> str:
        """Return the representation of the object."""
        return repr(self.to_dict())

    def __str__(self) -> str:
        """Return the string representation of the object."""
        return str(self.to_dict())

    @property
    def json_response(self) -> Dict:
        """Returns a copy of the json response of the parsed label."""
        return self.jobs.to_dict()

`init(self, label, json_interface, input_type)` `special`

Class that represents a parsed label.

The class behaves like a dict but adds the attribute .jobs.

The original input label passed to this class is not modified.

Parameters:

Name	Type	Description	Default
`label`	`Dict`	Label to parse.	required
`json_interface`	`Dict`	Json interface of the project.	required
`input_type`	`Literal['IMAGE', 'PDF', 'TEXT', 'VIDEO', 'VIDEO_LEGACY']`	Type of assets of the project.	required

Example

from kili.utils.labels.parsing import ParsedLabel

my_label = kili.labels("project_id")[0]  # my_label is a dict

my_parsed_label = ParsedLabel(my_label, json_interface, input_type)  # ParsedLabel object

# Access the job "JOB_0" data through the attribute ".jobs":
print(my_parsed_label.jobs["JOB_0"])

Info

More information about the label parsing can be found in this tutorial.

Source code in kili/utils/labels/parsing.py

def __init__(self, label: Dict, json_interface: Dict, input_type: InputType) -> None:
    # pylint: disable=line-too-long
    """Class that represents a parsed label.

    The class behaves like a dict but adds the attribute `.jobs`.

    The original input label passed to this class is not modified.

    Args:
        label: Label to parse.
        json_interface: Json interface of the project.
        input_type: Type of assets of the project.

    !!! Example
        ```python
        from kili.utils.labels.parsing import ParsedLabel

        my_label = kili.labels("project_id")[0]  # my_label is a dict

        my_parsed_label = ParsedLabel(my_label, json_interface, input_type)  # ParsedLabel object

        # Access the job "JOB_0" data through the attribute ".jobs":
        print(my_parsed_label.jobs["JOB_0"])
        ```

    !!! info
        More information about the label parsing can be found in this [tutorial](https://python-sdk-docs.kili-technology.com/latest/sdk/tutorials/label_parsing/).
    """
    label_copy = deepcopy(label)
    json_response = label_copy.pop("jsonResponse", {})

    super().__init__(label_copy)

    project_info = Project(inputType=input_type, jsonInterface=json_interface["jobs"])

    self.jobs = json_response_module.ParsedJobs(
        project_info=project_info, json_response=json_response
    )

`to_dict(self)`

Return a copy of the parsed label as a dict.

Example

my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)

# Convert back to native Python dictionary
my_label_as_dict = label.to_dict()

assert isinstance(my_label_as_dict, dict)  # True

Source code in kili/utils/labels/parsing.py

def to_dict(self) -> Dict:
    """Return a copy of the parsed label as a dict.

    !!! Example
        ```python
        my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)

        # Convert back to native Python dictionary
        my_label_as_dict = label.to_dict()

        assert isinstance(my_label_as_dict, dict)  # True
        ```
    """
    ret = {k: deepcopy(v) for k, v in self.items() if k != "jsonResponse"}
    ret["jsonResponse"] = self.json_response
    return ret

Task specific attributes and methods

Classification tasks

For classification tasks, the following attributes are available:

`.categories`

Returns a CategoryList object that contains the categories of an asset.

label.jobs["CLASSIF_JOB"].categories

`.category`

Returns a Category object that contains the category of an asset.

Only available if the classification job is a one-class classification job.

label.jobs["CLASSIF_JOB"].category
# Same as:
label.jobs["CLASSIF_JOB"].categories[0]

`.name`

Retrieves the category name.

label.jobs["CLASSIF_JOB"].category.name

Example

json_interface = {
    "jobs": {
        "JOB_0": {
            "mlTask": "CLASSIFICATION",
            "content": {
                "categories": {
                    "CATEGORY_A": {"name": "A"},
                    "CATEGORY_B": {"name": "B"},
                },
                "input": "radio",
            },
        }
    }
}
json_response_dict = {
    "JOB_0": {
        "categories": [
            {
                "confidence": 100,
                "name": "CATEGORY_A",
            }
        ]
    }
}
my_label = {"jsonResponse": json_response_dict}

parsed_label = ParsedLabel(label=my_label, json_interface=json_interface, input_type="IMAGE")

print(parsed_label.jobs["JOB_0"].categories[0].name)  # CATEGORY_A
print(parsed_label.jobs["JOB_0"].categories[0].display_name)  # A

`.display_name`

Retrieves the category name as it is displayed in the interface.

label.jobs["CLASSIF_JOB"].category.display_name

`.confidence`

Retrieves the confidence (when available).

label.jobs["CLASSIF_JOB"].category.confidence

Transcription tasks

`.text`

Retrieves the transcription text.

label.jobs["TRANSCRIPTION_JOB"].text

Object detection tasks

For more information about the different object detection tasks and their label formats, please refer to the Kili documentation.

Standard object detection

`.bounding_poly`

Returns a list of bounding polygons for an annotation.

label.jobs["DETECTION_JOB"].annotations[0].bounding_poly

`.normalized_vertices`

Returns a list of normalized vertices for a bounding polygon.

label.jobs["DETECTION_JOB"].annotations[0].bounding_poly[0].normalized_vertices

`.bounding_poly_annotations`

This attribute is an alias for .annotations.

The benefit of using this attribute is that it will only show in your IDE autocompletions the attributes that are relevant for the object detection task.

# the .content attribute is not relevant for object detection tasks!

# IDE autocompletion will accept this attribute, but will crash at runtime
label.jobs["BBOX_JOB"].annotations.content

# IDE autocompletion will not display this attribute and Python linter will raise an error
label.jobs["BBOX_JOB"].bounding_poly_annotations.content

Point detection

`.point`

Returns the x and y coordinates of the point.

label.jobs["POINT_JOB"].annotations[0].point

Line detection

`.polyline`

Returns the list of points for a line annotation.

label.jobs["LINE_JOB"].annotations[0].polyline

Pose estimation

`.points`

Returns the list of points for an annotation.

label.jobs["POSE_JOB"].annotations[0].points

`.point`

Returns the point data.

label.jobs["POSE_JOB"].annotations[0].points[0].point

`.point.point`

Returns a dictionary with the coordinates of the point.

label.jobs["POSE_JOB"].annotations[0].points[0].point.point

`.code`

Returns the point identifier (unique for each point in an object).

label.jobs["POSE_JOB"].annotations[0].points[0].point.code

`.name`

Returns the point name.

label.jobs["POSE_JOB"].annotations[0].points[0].point.name

`.job_name`

Returns the job which annotated point belongs to.

label.jobs["POSE_JOB"].annotations[0].points[0].point.job_name

Video tasks

`.frames`

Returns a list of parsed label data for a each frame.

label.jobs["FRAME_CLASSIF_JOB"].frames
label.jobs["FRAME_CLASSIF_JOB"].frames[5]  # 6th frame

# get category name of the 6th frame (for a frame classification job only)
label.jobs["FRAME_CLASSIF_JOB"].frames[5].category.name

Named entities recognition tasks

`.content`

Returns the content of the mention.

label.jobs["NER_JOB"].annotations[0].content

`.begin_offset`

Returns the position of the first character of the mention in the text.

label.jobs["NER_JOB"].annotations[0].begin_offset

`.end_offet`

When available, returns the position of the last character of the mention in the text.

label.jobs["NER_JOB"].annotations[0].end_offset

`.entity_annotations`

This attribute is an alias for .annotations.

The benefit of using this attribute is that it will only show in your IDE autocompletions the attributes that are relevant for the NER task.

# the .points attribute is not relevant for NER tasks, it is only used for pose estimation tasks!

# IDE autocompletion will accept this attribute, but will crash at runtime
label.jobs["NER_JOB"].annotations.points

# IDE autocompletion will not display this attribute and Python linter will raise an error
label.jobs["NER_JOB"].entity_annotations.points

Named entities recognition in PDFs tasks

`.content`

Returns the content of the mention.

label.jobs["NER_PDF_JOB"].annotations[0].content

`.annotations`

NER in PDFs annotations have an additional layer of annotations. See the documentation for more information.

`.polys`

Returns a list of dictionaries containing the normalized vertices of the mention.

label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].polys

`.page_number_array`

label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].page_number_array

`.bounding_poly`

Returns a list of dictionaries containing the normalized vertices of the mention.

label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].bounding_poly

Relation tasks

Named entities relation

`.start_entities`

Returns a list of dictionaries containing the start entities Ids of the relation.

label.jobs["NER_RELATION_JOB"].annotations[0].start_entities

`.end_entities`

Returns a list of dictionaries containing the end entities Ids of the relation.

label.jobs["NER_RELATION_JOB"].annotations[0].end_entities

Object detection relation

`.start_objects`

Returns a list of dictionaries containing the start objects Ids of the relation.

label.jobs["OBJECT_RELATION_JOB"].annotations[0].start_objects

`.end_objects`

Returns a list of dictionaries containing the end objects Ids of the relation.

label.jobs["OBJECT_RELATION_JOB"].annotations[0].end_objects

Children tasks

`.children`

Depending on the task, the .children attribute can be found in different places:

# For cassification task
label.jobs["CLASSIF_JOB"].category.children

# For several kinds of tasks: object detection, NER, pose estimation, etc.
label.jobs["OBJECT_DETECTION_JOB"].annotations[0].children

You can find more information about the children jobs in the label parsing tutorial.

Migrating from jsonReponse format

In most cases, the attributes of a parsed label are the snake case version of the keys present in the json response.

For example, with a NER (named entities recognition) label, you can access the beginOffset data of an annotation with parsed_label.jobs["NER_JOB"].annotations[0].begin_offset.

The different json response keys are listed in the Kili documentation:

Label parsing module

ParsedLabel

__init__(self, label, json_interface, input_type) special

to_dict(self)

Task specific attributes and methods

Classification tasks

.categories

.category

.name

.display_name

.confidence

Transcription tasks

.text

Object detection tasks

Standard object detection

.bounding_poly

.normalized_vertices

.bounding_poly_annotations

Point detection

.point

Line detection

.polyline

Pose estimation

.points

.point

.point.point

.code

.name

.job_name

Video tasks

.frames

Named entities recognition tasks

.content

.begin_offset

.end_offet

.entity_annotations

Named entities recognition in PDFs tasks

.content

.annotations

.polys

.page_number_array

.bounding_poly

Relation tasks

Named entities relation

.start_entities

.end_entities

Object detection relation

.start_objects

.end_objects

Children tasks

.children

Migrating from jsonReponse format

`init(self, label, json_interface, input_type)` `special`

`to_dict(self)`

`.categories`

`.category`

`.name`

`.display_name`

`.confidence`

`.text`

`.bounding_poly`

`.normalized_vertices`

`.bounding_poly_annotations`

`.point`

`.polyline`

`.points`

`.point`

`.point.point`

`.code`

`.name`

`.job_name`

`.frames`

`.content`

`.begin_offset`

`.end_offet`

`.entity_annotations`

`.content`

`.annotations`

`.polys`

`.page_number_array`

`.bounding_poly`

`.start_entities`

`.end_entities`

`.start_objects`

`.end_objects`

`.children`