How to import OCR pre-annotations

In this tutorial we will see how to import OCR pre-annotations in Kili using Google vision API.

Pre-annotating your data with OCR will save you a lot of time when annotating transcriptions in Kili.

The data that we use comes from The Street View Text Dataset.

Loading an image from The Street View Dataset in Kili

Follow this link to get the image for this tutorial:

We will use the Google Vision API to perform Optical Character Recognition on the different inscriptions in this image.

To use the google API, we need to install some packages:

%pip install google-cloud google-cloud-vision Pillow kili google-cloud-storage wget

import io
import json
import os

import wget
from google.cloud import vision
from google.oauth2 import service_account
from PIL import Image

from kili.client import Kili

We can now create the project ontology (json interface).

For a transcription task on images, the ontology is a classification job with nested transcriptions for each category:

json_interface = {
    "jobs": {
        "JOB_0": {  # this job is for annotating the bounding boxes
            "mlTask": "OBJECT_DETECTION",
            "tools": ["rectangle"],
            "instruction": "Categories",
            "required": 1,
            "isChild": False,
            "content": {
                "categories": {
                    "STORE_INFORMATIONS": {"name": "Store informations", "children": ["JOB_1"]},
                    "PRODUCTS": {"name": "Products", "children": ["JOB_2"]},
                },
                "input": "radio",
            },
        },
        "JOB_1": {
            "mlTask": "TRANSCRIPTION",
            "instruction": "Transcription of store informations",
            "required": 1,
            "isChild": True,
        },
        "JOB_2": {
            "mlTask": "TRANSCRIPTION",
            "instruction": "Transcription of products",
            "required": 1,
            "isChild": True,
        },
    }
}

Let's initialize the Kili client and create our project:

kili = Kili(
    # api_endpoint="https://cloud.kili-technology.com/api/label/v2/graphql",
    # the line above can be uncommented and changed if you are working with an on-premise version of Kili
)

# Create an OCR project
project = kili.create_project(
    description="OCR street view",
    input_type="IMAGE",
    json_interface=json_interface,
    title="[Kili SDK Notebook]: Street text OCR annotation project",
)
project_id = project["id"]

Creating OCR annotations using Google Vision API

We will now see how to perform OCR preannotation on our image using Google Vision API.

First you will need to create an account on Google Cloud:

create a project (or use an existing one)
then go to the Cloud Vision API page
activate the API for your project

Now that the API is activated we will need to get a secret key in order to call the API later in our project:

go to API and services page
and create a service account with authorization to use the vision API

On the service account details page:

click on add a key
download the key using json format
place the key in your environment variables or in a file

google_key_str = os.getenv("KILI_API_CLOUD_VISION")

if not google_key_str:
    path_to_json_key = "./google_cloud_key.json"
    with open(path_to_json_key) as file:
        google_key_str = file.read()

GOOGLE_KEY = json.loads(google_key_str)

assert GOOGLE_KEY

We can now start adding OCR pre-annotations to the asset metadata:

def implicit():
    from google.cloud import storage

    # If you don't specify credentials when constructing the client, the
    # client library will look for credentials in the environment.
    storage_client = storage.Client()

    # Make an authenticated API request
    buckets = list(storage_client.list_buckets())
    print(buckets)

def detect_text(path):
    """Detects text in the file."""
    credentials = service_account.Credentials.from_service_account_info(GOOGLE_KEY)
    client = vision.ImageAnnotatorClient(credentials=credentials)

    with open(path, "rb") as image_file:
        content = image_file.read()

    response = client.text_detection({"content": content})
    texts = response._pb.text_annotations
    text_annotations = []

    for text in texts:
        vertices = [{"x": vertex.x, "y": vertex.y} for vertex in text.bounding_poly.vertices]

        tmp = {
            "description": text.description,
            "boundingPoly": {
                "vertices": vertices,
            },
        }

        text_annotations.append(tmp)

    if response.error.message:
        raise Exception(
            "{}\nFor more info on error messages, check: "
            "https://cloud.google.com/apis/design/errors".format(response.error.message)
        )

    return text_annotations

PATH_TO_IMG = wget.download(
    "https://raw.githubusercontent.com/kili-technology/kili-python-sdk/main/recipes/img/store_front.jpeg"
)

text_annotations = detect_text(PATH_TO_IMG)
assert len(text_annotations) > 0
print(f"Found {len(text_annotations)} boxes of text.")
print(text_annotations[0])

Found 22 boxes of text.
{'description': "CD\nITALIAN $1\nESPRESSO Shot\nplus tax\nFINE ITALIAN\nIMPORTS & DELI\nJIM\nIMMY'S FRESH MEATS\nSAUSAGES\nFOOD STORE\nIX", 'boundingPoly': {'vertices': [{'x': 24, 'y': 6}, {'x': 1668, 'y': 6}, {'x': 1668, 'y': 553}, {'x': 24, 'y': 553}]}}

im = Image.open(PATH_TO_IMG)
IMG_WIDTH, IMG_HEIGHT = im.size
print(im.size)

(1680, 1050)

We now need to convert the OCR predictions to the Kili asset metadata format:

full_text_annotations = {
    "fullTextAnnotation": {
        "pages": [{"height": IMG_HEIGHT, "width": IMG_WIDTH}],
    },
    "textAnnotations": text_annotations,
}

We follow the Google Vision API AnnotateImageResponse format. So in the end, the OCR data to insert into Kili as a JSON metadata contains:

Full text annotation. A list of pages in the document with their respective heights and widths.
A list of text annotations with:
- text content
- bounding box coordinates.

{
  "fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
  "textAnnotations": [
    {
      "description": "7SB75",
      "boundingPoly": {
        "vertices": [
          { "x": 536, "y": 259 },
          { "x": 529, "y": 514 },
          { "x": 449, "y": 512 },
          { "x": 456, "y": 257 }
        ]
      }
    },
    {
      "description": "09TGG",
      "boundingPoly": {
        "vertices": [
          { "x": 436, "y": 256 },
          { "x": 435, "y": 515 },
          { "x": 360, "y": 515 },
          { "x": 361, "y": 256 }
        ]
      }
    }
  ]
}

Let's upload the asset with its pre-annotations to Kili:

external_id = "store"
content = PATH_TO_IMG

kili.append_many_to_dataset(
    project_id=project_id,
    content_array=[content],
    external_id_array=[external_id],
    json_metadata_array=[full_text_annotations],
)

Annotate in Kili

You can now annotate your images and you will see the text automatically extracted:

Congrats! 👏

Pre-annotating your assets can save you a significant amount of time and improve the accuracy of your labeling ⏳🎯.

Cleanup

To clean up, we simply need to remove the project that we created:

kili.delete_project(project_id)