How to Set Up a Kili LLM Static project
In this tutorial you'll learn how to create and import conversations in a Kili project with a custom interface for comparing LLM outputs.
Here are the steps we will follow:
- Creating a Kili project with a custom interface
- Import three conversations to the project
Creating a Kili Project with a Custom Interface
We will create a Kili project with a custom interface that includes several jobs for comparing LLM outputs.
Defining Three Levels of Annotation Jobs
To streamline the annotation process, we define three distinct levels of annotation jobs:
-
Completion: This job enables annotators to evaluate individual responses generated by LLMs. Each response is annotated separately.
-
Round: This job allows annotators to assess a single round of conversation, grouping all the LLM responses within that round under a single annotation.
-
Conversation: This job facilitates annotation at the conversation level, where the entire exchange can be evaluated as a whole.
In this example, we use a JSON interface that incorporates classifications at all these levels, enabling comprehensive annotation:
interface = {
"jobs": {
"CLASSIFICATION_JOB_AT_COMPLETION_LEVEL": {
"content": {
"categories": {
"TOO_SHORT": {"children": [], "name": "Too short", "id": "category1"},
"JUST_RIGHT": {"children": [], "name": "Just right", "id": "category2"},
"TOO_VERBOSE": {"children": [], "name": "Too verbose", "id": "category3"},
},
"input": "radio",
},
"instruction": "Verbosity",
"level": "completion",
"mlTask": "CLASSIFICATION",
"required": 0,
"isChild": False,
"isNew": False,
},
"CLASSIFICATION_JOB_AT_COMPLETION_LEVEL_1": {
"content": {
"categories": {
"NO_ISSUES": {"children": [], "name": "No issues", "id": "category4"},
"MINOR_ISSUES": {"children": [], "name": "Minor issue(s)", "id": "category5"},
"MAJOR_ISSUES": {"children": [], "name": "Major issue(s)", "id": "category6"},
},
"input": "radio",
},
"instruction": "Instructions Following",
"level": "completion",
"mlTask": "CLASSIFICATION",
"required": 0,
"isChild": False,
"isNew": False,
},
"CLASSIFICATION_JOB_AT_COMPLETION_LEVEL_2": {
"content": {
"categories": {
"NO_ISSUES": {"children": [], "name": "No issues", "id": "category7"},
"MINOR_INACCURACY": {
"children": [],
"name": "Minor inaccuracy",
"id": "category8",
},
"MAJOR_INACCURACY": {
"children": [],
"name": "Major inaccuracy",
"id": "category9",
},
},
"input": "radio",
},
"instruction": "Truthfulness",
"level": "completion",
"mlTask": "CLASSIFICATION",
"required": 0,
"isChild": False,
"isNew": False,
},
"CLASSIFICATION_JOB_AT_COMPLETION_LEVEL_3": {
"content": {
"categories": {
"NO_ISSUES": {"children": [], "name": "No issues", "id": "category10"},
"MINOR_SAFETY_CONCERN": {
"children": [],
"name": "Minor safety concern",
"id": "category11",
},
"MAJOR_SAFETY_CONCERN": {
"children": [],
"name": "Major safety concern",
"id": "category12",
},
},
"input": "radio",
},
"instruction": "Harmlessness/Safety",
"level": "completion",
"mlTask": "CLASSIFICATION",
"required": 0,
"isChild": False,
"isNew": False,
},
"COMPARISON_JOB": {
"content": {
"options": {
"IS_MUCH_BETTER": {"children": [], "name": "Is much better", "id": "option13"},
"IS_BETTER": {"children": [], "name": "Is better", "id": "option14"},
"IS_SLIGHTLY_BETTER": {
"children": [],
"name": "Is slightly better",
"id": "option15",
},
"TIE": {"children": [], "name": "Tie", "mutual": True, "id": "option16"},
},
"input": "radio",
},
"instruction": "Pick the best answer",
"mlTask": "COMPARISON",
"required": 1,
"isChild": False,
"isNew": False,
},
"CLASSIFICATION_JOB_AT_ROUND_LEVEL": {
"content": {
"categories": {
"BOTH_ARE_GOOD": {"children": [], "name": "Both are good", "id": "category17"},
"BOTH_ARE_BAD": {"children": [], "name": "Both are bad", "id": "category18"},
},
"input": "radio",
},
"instruction": "Overall quality",
"level": "round",
"mlTask": "CLASSIFICATION",
"required": 0,
"isChild": False,
"isNew": False,
},
"CLASSIFICATION_JOB_AT_CONVERSATION_LEVEL": {
"content": {
"categories": {
"GLOBAL_GOOD": {"children": [], "name": "Globally good", "id": "category19"},
"BOTH_ARE_BAD": {"children": [], "name": "Globally bad", "id": "category20"},
},
"input": "radio",
},
"instruction": "Global",
"level": "conversation",
"mlTask": "CLASSIFICATION",
"required": 0,
"isChild": False,
"isNew": False,
},
"TRANSCRIPTION_JOB_AT_CONVERSATION_LEVEL": {
"content": {"input": "textField"},
"instruction": "Additional comments...",
"level": "conversation",
"mlTask": "TRANSCRIPTION",
"required": 0,
"isChild": False,
"isNew": False,
},
}
}
Now, we create the project using the create_project
method, with type LLM_STATIC
:
from kili.client import Kili
kili = Kili(
# api_endpoint="https://cloud.kili-technology.com/api/label/v2/graphql",
)
project = kili.create_project(
title="[Kili SDK Notebook]: LLM Static",
description="Project Description",
input_type="LLM_STATIC",
json_interface=interface,
)
project_id = project["id"]
Import conversations
We will import three conversations to the project. The conversations are stored in a JSON file, which we will load and import using the import_conversations
method.
import requests
conversations = requests.get(
"https://storage.googleapis.com/label-public-staging/demo-projects/LLM_static/llm-conversations.json"
).json()
kili.llm.import_conversations(project_id, conversations=conversations)
You can now see the conversations imported in the UI :
In this tutorial, we've:
- Created a Kili project with a custom interface for LLM output comparison.
- Imported conversations using Kili LLM format.