Label parsing module
The module kili.utils.labels.parsing
provides a ParsedLabel
class that is used to parse labels.
Using labels as ParsedLabel
instances is recommended when manipulating the label data, as it will provide autocompletion to access the meaningful fields of the label. If you prefer not to use it, you can still access the labeling data through the label dictionaries.
Read more about this feature in the label parsing tutorial.
Warning
This feature is currently in beta. The classes and methods can still change marginally.
ParsedLabel
Class that represents a parsed label.
Source code in kili/utils/labels/parsing.py
class ParsedLabel(Dict):
"""Class that represents a parsed label."""
def __init__(self, label: Dict, json_interface: Dict, input_type: InputType) -> None:
# pylint: disable=line-too-long
"""Class that represents a parsed label.
The class behaves like a dict but adds the attribute `.jobs`.
The original input label passed to this class is not modified.
Args:
label: Label to parse.
json_interface: Json interface of the project.
input_type: Type of assets of the project.
!!! Example
```python
from kili.utils.labels.parsing import ParsedLabel
my_label = kili.labels("project_id")[0] # my_label is a dict
my_parsed_label = ParsedLabel(my_label, json_interface, input_type) # ParsedLabel object
# Access the job "JOB_0" data through the attribute ".jobs":
print(my_parsed_label.jobs["JOB_0"])
```
!!! info
More information about the label parsing can be found in this [tutorial](https://python-sdk-docs.kili-technology.com/latest/sdk/tutorials/label_parsing/).
"""
label_copy = deepcopy(label)
json_response = label_copy.pop("jsonResponse", {})
super().__init__(label_copy)
project_info = Project(inputType=input_type, jsonInterface=json_interface["jobs"])
self.jobs = json_response_module.ParsedJobs(
project_info=project_info, json_response=json_response
)
def to_dict(self) -> Dict:
"""Return a copy of the parsed label as a dict.
!!! Example
```python
my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)
# Convert back to native Python dictionary
my_label_as_dict = label.to_dict()
assert isinstance(my_label_as_dict, dict) # True
```
"""
ret = {k: deepcopy(v) for k, v in self.items() if k != "jsonResponse"}
ret["jsonResponse"] = self.json_response
return ret
def __repr__(self) -> str:
"""Return the representation of the object."""
return repr(self.to_dict())
def __str__(self) -> str:
"""Return the string representation of the object."""
return str(self.to_dict())
@property
def json_response(self) -> Dict:
"""Returns a copy of the json response of the parsed label."""
return self.jobs.to_dict()
__init__(self, label, json_interface, input_type)
special
Class that represents a parsed label.
The class behaves like a dict but adds the attribute .jobs
.
The original input label passed to this class is not modified.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
label |
Dict |
Label to parse. |
required |
json_interface |
Dict |
Json interface of the project. |
required |
input_type |
Literal['IMAGE', 'PDF', 'TEXT', 'VIDEO', 'LLM_RLHF', 'LLM_INSTR_FOLLOWING'] |
Type of assets of the project. |
required |
Example
from kili.utils.labels.parsing import ParsedLabel
my_label = kili.labels("project_id")[0] # my_label is a dict
my_parsed_label = ParsedLabel(my_label, json_interface, input_type) # ParsedLabel object
# Access the job "JOB_0" data through the attribute ".jobs":
print(my_parsed_label.jobs["JOB_0"])
Info
More information about the label parsing can be found in this tutorial.
Source code in kili/utils/labels/parsing.py
def __init__(self, label: Dict, json_interface: Dict, input_type: InputType) -> None:
# pylint: disable=line-too-long
"""Class that represents a parsed label.
The class behaves like a dict but adds the attribute `.jobs`.
The original input label passed to this class is not modified.
Args:
label: Label to parse.
json_interface: Json interface of the project.
input_type: Type of assets of the project.
!!! Example
```python
from kili.utils.labels.parsing import ParsedLabel
my_label = kili.labels("project_id")[0] # my_label is a dict
my_parsed_label = ParsedLabel(my_label, json_interface, input_type) # ParsedLabel object
# Access the job "JOB_0" data through the attribute ".jobs":
print(my_parsed_label.jobs["JOB_0"])
```
!!! info
More information about the label parsing can be found in this [tutorial](https://python-sdk-docs.kili-technology.com/latest/sdk/tutorials/label_parsing/).
"""
label_copy = deepcopy(label)
json_response = label_copy.pop("jsonResponse", {})
super().__init__(label_copy)
project_info = Project(inputType=input_type, jsonInterface=json_interface["jobs"])
self.jobs = json_response_module.ParsedJobs(
project_info=project_info, json_response=json_response
)
to_dict(self)
Return a copy of the parsed label as a dict.
Example
my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)
# Convert back to native Python dictionary
my_label_as_dict = label.to_dict()
assert isinstance(my_label_as_dict, dict) # True
Source code in kili/utils/labels/parsing.py
def to_dict(self) -> Dict:
"""Return a copy of the parsed label as a dict.
!!! Example
```python
my_parsed_label = ParsedLabel(my_dict_label, json_interface, input_type)
# Convert back to native Python dictionary
my_label_as_dict = label.to_dict()
assert isinstance(my_label_as_dict, dict) # True
```
"""
ret = {k: deepcopy(v) for k, v in self.items() if k != "jsonResponse"}
ret["jsonResponse"] = self.json_response
return ret
Task specific attributes and methods
Classification tasks
For classification tasks, the following attributes are available:
.categories
Returns a CategoryList
object that contains the categories of an asset.
label.jobs["CLASSIF_JOB"].categories
.category
Returns a Category
object that contains the category of an asset.
Only available if the classification job is a one-class classification job.
label.jobs["CLASSIF_JOB"].category
# Same as:
label.jobs["CLASSIF_JOB"].categories[0]
.name
Retrieves the category name.
label.jobs["CLASSIF_JOB"].category.name
Example
json_interface = {
"jobs": {
"JOB_0": {
"mlTask": "CLASSIFICATION",
"content": {
"categories": {
"CATEGORY_A": {"name": "A"},
"CATEGORY_B": {"name": "B"},
},
"input": "radio",
},
}
}
}
json_response_dict = {
"JOB_0": {
"categories": [
{
"confidence": 100,
"name": "CATEGORY_A",
}
]
}
}
my_label = {"jsonResponse": json_response_dict}
parsed_label = ParsedLabel(label=my_label, json_interface=json_interface, input_type="IMAGE")
print(parsed_label.jobs["JOB_0"].categories[0].name) # CATEGORY_A
print(parsed_label.jobs["JOB_0"].categories[0].display_name) # A
.display_name
Retrieves the category name as it is displayed in the interface.
label.jobs["CLASSIF_JOB"].category.display_name
.confidence
Retrieves the confidence (when available).
label.jobs["CLASSIF_JOB"].category.confidence
Transcription tasks
.text
Retrieves the transcription text.
label.jobs["TRANSCRIPTION_JOB"].text
Object detection tasks
For more information about the different object detection tasks and their label formats, please refer to the Kili documentation.
Standard object detection
.bounding_poly
Returns a list of bounding polygons for an annotation.
label.jobs["DETECTION_JOB"].annotations[0].bounding_poly
.normalized_vertices
Returns a list of normalized vertices for a bounding polygon.
label.jobs["DETECTION_JOB"].annotations[0].bounding_poly[0].normalized_vertices
.bounding_poly_annotations
This attribute is an alias for .annotations
.
The benefit of using this attribute is that it will only show in your IDE autocompletions the attributes that are relevant for the object detection task.
# the .content attribute is not relevant for object detection tasks!
# IDE autocompletion will accept this attribute, but will crash at runtime
label.jobs["BBOX_JOB"].annotations.content
# IDE autocompletion will not display this attribute and Python linter will raise an error
label.jobs["BBOX_JOB"].bounding_poly_annotations.content
Point detection
.point
Returns the x
and y
coordinates of the point.
label.jobs["POINT_JOB"].annotations[0].point
Line detection
.polyline
Returns the list of points for a line annotation.
label.jobs["LINE_JOB"].annotations[0].polyline
Pose estimation
.points
Returns the list of points for an annotation.
label.jobs["POSE_JOB"].annotations[0].points
.point
Returns the point data.
label.jobs["POSE_JOB"].annotations[0].points[0].point
.point.point
Returns a dictionary with the coordinates of the point.
label.jobs["POSE_JOB"].annotations[0].points[0].point.point
.code
Returns the point identifier (unique for each point in an object).
label.jobs["POSE_JOB"].annotations[0].points[0].point.code
.name
Returns the point name.
label.jobs["POSE_JOB"].annotations[0].points[0].point.name
.job_name
Returns the job which annotated point belongs to.
label.jobs["POSE_JOB"].annotations[0].points[0].point.job_name
Video tasks
.frames
Returns a list of parsed label data for a each frame.
label.jobs["FRAME_CLASSIF_JOB"].frames
label.jobs["FRAME_CLASSIF_JOB"].frames[5] # 6th frame
# get category name of the 6th frame (for a frame classification job only)
label.jobs["FRAME_CLASSIF_JOB"].frames[5].category.name
Named entities recognition tasks
.content
Returns the content of the mention.
label.jobs["NER_JOB"].annotations[0].content
.begin_offset
Returns the position of the first character of the mention in the text.
label.jobs["NER_JOB"].annotations[0].begin_offset
.end_offet
When available, returns the position of the last character of the mention in the text.
label.jobs["NER_JOB"].annotations[0].end_offset
.entity_annotations
This attribute is an alias for .annotations
.
The benefit of using this attribute is that it will only show in your IDE autocompletions the attributes that are relevant for the NER task.
# the .points attribute is not relevant for NER tasks, it is only used for pose estimation tasks!
# IDE autocompletion will accept this attribute, but will crash at runtime
label.jobs["NER_JOB"].annotations.points
# IDE autocompletion will not display this attribute and Python linter will raise an error
label.jobs["NER_JOB"].entity_annotations.points
Named entities recognition in PDFs tasks
.content
Returns the content of the mention.
label.jobs["NER_PDF_JOB"].annotations[0].content
.annotations
NER in PDFs annotations have an additional layer of annotations. See the documentation for more information.
.polys
Returns a list of dictionaries containing the normalized vertices of the mention.
label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].polys
.page_number_array
label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].page_number_array
.bounding_poly
Returns a list of dictionaries containing the normalized vertices of the mention.
label.jobs["NER_PDF_JOB"].annotations[0].annotations[0].bounding_poly
Relation tasks
Named entities relation
.start_entities
Returns a list of dictionaries containing the start entities Ids of the relation.
label.jobs["NER_RELATION_JOB"].annotations[0].start_entities
.end_entities
Returns a list of dictionaries containing the end entities Ids of the relation.
label.jobs["NER_RELATION_JOB"].annotations[0].end_entities
Object detection relation
.start_objects
Returns a list of dictionaries containing the start objects Ids of the relation.
label.jobs["OBJECT_RELATION_JOB"].annotations[0].start_objects
.end_objects
Returns a list of dictionaries containing the end objects Ids of the relation.
label.jobs["OBJECT_RELATION_JOB"].annotations[0].end_objects
Children tasks
.children
Depending on the task, the .children
attribute can be found in different places:
# For cassification task
label.jobs["CLASSIF_JOB"].category.children
# For several kinds of tasks: object detection, NER, pose estimation, etc.
label.jobs["OBJECT_DETECTION_JOB"].annotations[0].children
You can find more information about the children jobs in the label parsing tutorial.
Migrating from jsonReponse format
In most cases, the attributes of a parsed label are the snake case version of the keys present in the json response.
For example, with a NER (named entities recognition) label, you can access the beginOffset
data of an annotation with parsed_label.jobs["NER_JOB"].annotations[0].begin_offset
.
The different json response keys are listed in the Kili documentation: