Skip to content

Open In Colab

Train an object detection model with Vertex AI AutoML and Kili for faster annotation

What is Google Vertex AI?

Vertex AI is a comprehensive machine learning platform designed for the training, deployment, and customization of ML models and AI applications, including large language models (LLMs). It seamlessly integrates with Kili to create a sophisticated model-in-the-loop data annotation workflow.

For more information on Google Vertex AI AutoML, you can visit Introduction to Vertex AI and the AutoML beginner's guide.

Tutorial Objectives

In this tutorial, we will demonstrate how to train an object detection model with annotated data on Google Vertex AI AutoML and utilize it to make predictions on unlabeled data to speed up the annotation process. The tutorial will cover the following steps: 1. Fetching and preparing data from Kili 2. Training a model with Vertex AI 3. Running predictions on unlabeled images from your Kili dataset 4. Uploading predictions to our Kili project for faster annotation

All steps can be performed through the Google Cloud Console UI, but this tutorial focuses on performing actions exclusively through the AI Platform Python SDK and Kili Python SDK to provide a deeper integration.

For this tutorial, we will use the BCCD dataset (Blood Cell Count and Detection), which is accessible on GitHub and also hosted on Hugging Face or Roboflow. It consists of 3 highly unbalanced classes: platelets, RBCs (red blood cells), and WBCs (white blood cells).

example-1.jpg

Installation Requirements

!pip install -U google-cloud-aiplatform
!pip install kili
import json
import mimetypes
import random
from pathlib import Path
from typing import List, Union

import requests
from google.cloud import aiplatform, storage
from google.cloud.aiplatform.gapic.schema import trainingjob
from google.colab import auth
from PIL import Image
from tqdm import tqdm

from kili.client import Kili

We first initialize the Kili client.

The API Key can be found in the settings of your project, in the "interface" section.. The object detection job name of your project can be found in the interface of your project.

kili_api_key = "[KILI API KEY]"
project_id = "[KILI PROJECT ID]"
OBJECT_DETECTION_JOB_NAME = "[OBJECT DETECTION JOB NAME]"  # e.g: OBJECT_DETECTION_JOB

kili = Kili(api_key=kili_api_key)
# Authenticate with Google Cloud
auth.authenticate_user()

# Initialize the GCS client
storage_client = storage.Client()
project_name = "[GCP PROJECT NAME]"
location = "[GCP PROJECT LOCATION]"
bucket_name = "[GCS BUCKET NAME]"
dataset_name = "blood_cell"  # name of the dataset you will create in Vertex AI
bucket_dataset_dir = (
    f"experiments/{dataset_name}"  # name of the folder in yout bucket where files will be stored.
)

aiplatform.init(project=project_name, location=location)

Prepare the Data

To train our Vertex AI AutoML model, we need to prepare the data within our Kili project. This section follows the guidelines provided in the Vertex AI data preparation documentation

This section covers the following steps: - Downloading annotated images from Kili to your local machine along with their labels. - Uploading the images to Google Cloud Storage. - Splitting the annotated data into train/validation/test sets. - Converting the images and labels into the required format for Vertex AI datasets. - Uploading the converted input data to Google Cloud Storage.

In this particular example, we have annotated 150 assets on the Kili app.

Retrieving and downloading labeled assets from Kili

We first call the Kili Python SDK assets function in order to retrieve assets.

The download_media argument allows to download the media (images here) in the folder given in the local_media_dir argument. When doing so, the content field will automatically be replaced by the local path of the downloaded asset.

For each asset, we query its id, externalId, and the jsonResponse fields of its latest label (the last one submitted on Kili). For more information on the assets function or on other fields that you can query, you can have a look at the function documentation.

assets = kili.assets(
    project_id=project_id,
    download_media=True,
    local_media_dir="./images",
    status_in=["LABELED"],
    fields=["latestLabel.jsonResponse", "content", "id", "externalId"],
    disable_tqdm=False,
)
# Plot an example image
Image.open(assets[0]["content"])

png

Uploading images to Google Cloud Storage

When importing data to a Vertex AI Datset, the images must already be stored on a Google Cloud Storage:

def upload_assets_to_bucket(assets: List[dict], bucket_name: str, bucket_dataset_dir: str):
    bucket = storage_client.get_bucket(bucket_name)
    for asset in tqdm(assets, desc="uploading assets to bucket"):
        image_bucket_path = f"{bucket_dataset_dir}/images/{Path(asset['content']).name}"
        image_local_path = asset["content"]
        blob = bucket.blob(image_bucket_path)
        blob.upload_from_filename(image_local_path)
upload_assets_to_bucket(assets, bucket_name, bucket_dataset_dir)

Splitting the Dataset into Train/Validation/Test Sets

We will divide our annotated images using the following proportions:

  • Training set: 70%
  • Validation set: 20%
  • Test set: 10%
def split_assets_in_train_val_test(assets: list[dict]):
    # shuffle the assets before splitting
    random.shuffle(assets)

    # Calculate the lengths of each split
    total_len = len(assets)
    train_len = int(0.7 * total_len)
    val_len = int(0.2 * total_len)

    # Split the list into train, validation, and test
    train_assets = assets[:train_len]
    val_assets = assets[train_len : train_len + val_len]
    test_assets = assets[train_len + val_len :]

    return train_assets, val_assets, test_assets
train_assets, val_assets, test_assets = split_assets_in_train_val_test(assets)
print(len(train_assets), len(val_assets), len(test_assets))
105 30 15

Converting images and labels into Vertex AI required format

Data that we upload to the Vertex AI dataset must follow a predefined schema. The schema is provided in a YAML file format that is based on the OpenAPI format.

In the following function, we generate JSON data that adheres to the required schema for an asset. This includes the location of the asset on Google Cloud Storage, the bounding boxes with their coordinates and categories, as well as the asset split.

def get_asset_io_input(asset: list[dict], split: str, bucket_name: str, image_bucket_path: str):
    image_gcs_uri = f"gs://{bucket_name}/{image_bucket_path}"
    labels = []
    for annotation in asset["latestLabel"]["jsonResponse"][OBJECT_DETECTION_JOB_NAME][
        "annotations"
    ]:
        normalizedVertices = annotation["boundingPoly"][0]["normalizedVertices"]
        label = {
            "displayName": annotation["categories"][0]["name"],
            "xMin": min(map(lambda vertice: vertice["x"], normalizedVertices)),
            "yMin": min(map(lambda vertice: vertice["y"], normalizedVertices)),
            "xMax": max(map(lambda vertice: vertice["x"], normalizedVertices)),
            "yMax": max(map(lambda vertice: vertice["y"], normalizedVertices)),
        }
        labels.append(label)

    return {
        "imageGcsUri": image_gcs_uri,
        "boundingBoxAnnotations": labels,
        "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": split},
    }

We will generate one jsonl file storing all image json input. Each line of the jsonl file will corresponds to one data to import in the previously defined format

This jsonl file then needs to be imported to the Google Cloud Storage bucket and will be the input given at the dataset creation.

def generate_and_upload_inputs_to_bucket(
    assets: List[dict], bucket_name: str, bucket_dataset_dir: str
):
    output_jsonl_file = "inputs.jsonl"
    bucket = storage_client.get_bucket(bucket_name)
    with open(output_jsonl_file, "w") as output_file:
        for split_name, split_assets in [
            ("training", train_assets),
            ("validation", val_assets),
            ("test", test_assets),
        ]:
            for asset in split_assets:
                # get the input object to be sent when importing the dataset
                image_bucket_path = f"{bucket_dataset_dir}/images/{Path(asset['content']).name}"
                input = get_asset_io_input(asset, split_name, bucket_name, image_bucket_path)

                # add the example input to the jsonl file
                json_line = json.dumps(input)
                output_file.write(json_line + "\n")

    # upload the inputs file to the bucket
    blob = bucket.blob(f"{bucket_dataset_dir}/inputs.jsonl")
    blob.upload_from_filename(output_jsonl_file)
generate_and_upload_inputs_to_bucket(assets, bucket_name, bucket_dataset_dir)

Create a dataset in Vertex

To train an autoML model, we need a Vertex AI dataset.

When creating this dataset, we also provide the jsonl file generated in the last section to import images with their labels into the dataset.

For more information on dataset creation, you can have a look at Vertex AI documentation

def create_and_import_dataset_image_sample(
    project: str,
    location: str,
    display_name: str,
    src_uris: Union[str, List[str]],
    import_schema_uri: str,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    ds = aiplatform.ImageDataset.create(
        display_name=display_name,
        gcs_source=src_uris,
        import_schema_uri=import_schema_uri,
        sync=sync,
    )

    ds.wait()

    print(ds.display_name)
    print(ds.resource_name)
    return ds
inputs_uri = f"gs://{bucket_name}/{bucket_dataset_dir}/inputs.jsonl"
import_schema_uri = (
    "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_bounding_box_io_format_1.0.0.yaml"
)
ds = create_and_import_dataset_image_sample(
    project_name, location, dataset_name, inputs_uri, import_schema_uri
)
INFO:google.cloud.aiplatform.datasets.dataset:Creating ImageDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create ImageDataset backing LRO: projects/**********/locations/europe-west4/datasets/2314474175491735552/operations/7318488207820062720
INFO:google.cloud.aiplatform.datasets.dataset:ImageDataset created. Resource name: projects/**********/locations/europe-west4/datasets/2314474175491735552
INFO:google.cloud.aiplatform.datasets.dataset:To use this ImageDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.ImageDataset('projects/**********/locations/europe-west4/datasets/2314474175491735552')
INFO:google.cloud.aiplatform.datasets.dataset:Importing ImageDataset data: projects/**********/locations/europe-west4/datasets/2314474175491735552
INFO:google.cloud.aiplatform.datasets.dataset:Import ImageDataset data backing LRO: projects/**********/locations/europe-west4/datasets/2314474175491735552/operations/7021250632413609984
INFO:google.cloud.aiplatform.datasets.dataset:ImageDataset data imported. Resource name: projects/**********/locations/europe-west4/datasets/2314474175491735552


plastic
projects/**********/locations/europe-west4/datasets/2314474175491735552
dataset_id = ds.name.split("/")[-1]
print(dataset_id)
2314474175491735552

2023-10-30_12-48.jpg

Train a Model

We are now ready to train our model!

When creating the training model pipeline, we provide the task definition schema (object detection here), and filter_splits tags that provide autoML training algorithm with the splits that we have made on our dataset when importing images.

We also provide some task inputs like the model type and a time budget.

We use the default model type CLOUD_HIGH_ACCURACY_1 which is expected to have a higher latency, but should also have a higher prediction quality than other models.

The time budget signifies the maximum cost budget that we are prepared to allocate for the training. If the model converges before reaching this budget, it will cease operation. We have set the time budget at a minimum level for initial experiments, but you are free to increase it whenever you wish to transition from experimental stages to production. For a detailed explanation of autoML costs, you can visit the Vertex AI Pricing page

def create_training_pipeline_image_object_detection_sample(
    project: str, display_name: str, dataset_id: str, model_display_name: str, location: str
):
    client_options = {"api_endpoint": f"{location}-aiplatform.googleapis.com"}
    client = aiplatform.gapic.PipelineServiceClient(client_options=client_options)

    training_task_inputs = trainingjob.definition.AutoMlImageObjectDetectionInputs(
        model_type="CLOUD_HIGH_ACCURACY_1",
        budget_milli_node_hours=20000,  # The minimum time budget
        disable_early_stopping=False,
    ).to_value()

    training_pipeline = {
        "display_name": display_name,
        "training_task_definition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_object_detection_1.0.0.yaml",
        "training_task_inputs": training_task_inputs,
        "input_data_config": {
            "dataset_id": dataset_id,
            "filter_split": {
                "training_filter": "labels.aiplatform.googleapis.com/ml_use=training",
                "validation_filter": "labels.aiplatform.googleapis.com/ml_use=validation",
                "test_filter": "labels.aiplatform.googleapis.com/ml_use=test",
            },
        },
        "model_to_upload": {"display_name": model_display_name},
    }
    parent = f"projects/{project}/locations/{location}"
    response = client.create_training_pipeline(parent=parent, training_pipeline=training_pipeline)
    print("response:", response)
    return response

The given code will launch an asynchronous training job pipeline.

Once launched, the pipeline can be accessed at the following adress: https://console.cloud.google.com/vertex-ai/training/training-pipelines

training_name = "autoML-training-blood_cell-poc-v2"
model_display_name = "blood_cell-poc-v2"
response = create_training_pipeline_image_object_detection_sample(
    project_name, training_name, dataset_id, model_display_name, location
)
response: name: "projects/**********/locations/europe-west4/trainingPipelines/5236749032969207808"
display_name: "autoML-training-blood_cell-poc-v2"
input_data_config {
  dataset_id: "2314474175491735552"
  filter_split {
    training_filter: "labels.aiplatform.googleapis.com/ml_use=training"
    validation_filter: "labels.aiplatform.googleapis.com/ml_use=validation"
    test_filter: "labels.aiplatform.googleapis.com/ml_use=test"
  }
}
training_task_definition: "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_object_detection_1.0.0.yaml"
training_task_inputs {
  struct_value {
    fields {
      key: "budgetMilliNodeHours"
      value {
        string_value: "20000"
      }
    }
    fields {
      key: "modelType"
      value {
        string_value: "CLOUD_HIGH_ACCURACY_1"
      }
    }
  }
}
model_to_upload {
  display_name: "blood_cell-poc-v2"
}
state: PIPELINE_STATE_PENDING
create_time {
  seconds: 1698654485
  nanos: 864404000
}
update_time {
  seconds: 1698654485
  nanos: 864404000
}

Once trained, your model should appear at the following adress: https://console.cloud.google.com/vertex-ai/models

You can now evaluate it, deploy it, create predictions etc.

To proceed with additional operations, it's necessary to obtain the model's unique identifier (ID) or resource name, which can be found in the model's information section. This identifier is readily accessible through the Google Cloud Console interface. While it is possible to fetch the ID programmatically using the Python SDK for a more comprehensive integration, for the purposes of this tutorial, we will streamline the process by directly retrieving the ID from the console's user interface.

# MODEL ID to be found on the model registry of Vertex AI: https://console.cloud.google.com/vertex-ai/models
model_resource_name = "[MODEL ID]"

Batch inference prediction

Now that our model is trained, we can download unlabeled data from Kili, upload it to cloud storage and prepare it as required for batch prediction input as defined in Vertex AI's doc

# Retrieve and download unlabeled assets from Kili
unlabeled_assets = kili.assets(
    project_id=project_id,
    download_media=True,
    local_media_dir="./images",
    status_in=["TODO"],
    fields=["content", "id", "externalId"],
    disable_tqdm=False,
)
upload_assets_to_bucket(unlabeled_assets, bucket_name, bucket_dataset_dir)
def upload_test_source_to_bucket(assets: List[dict], bucket_name: str, bucket_dataset_dir: str):
    output_jsonl_file = "batch_inference_inputs.jsonl"
    bucket = storage_client.get_bucket(bucket_name)
    with open(output_jsonl_file, "w") as output_file:
        for asset in assets:
            # get the input object to be sent when importing the dataset
            input = {
                "content": f"{bucket_dataset_dir}/images/{Path(asset['content']).name}",
                "mimeType": mimetypes.guess_type(asset["content"])[0],
            }

            # add the test input to the jsonl file
            json_line = json.dumps(input)
            output_file.write(json_line + "\n")

    # upload the inputs file to the bucket
    blob = bucket.blob(f"{bucket_dataset_dir}/batch_inference_inputs.jsonl")
    blob.upload_from_filename(output_jsonl_file)
upload_test_source_to_bucket(unlabeled_assets, bucket_name, bucket_dataset_dir)

We now create a batch prediction job and wait for its result.

def create_batch_prediction_job_sample(
    project: str,
    location: str,
    model_resource_name: str,
    job_display_name: str,
    gcs_source: str,
    gcs_destination: str,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    my_model = aiplatform.Model(model_resource_name)

    batch_prediction_job = my_model.batch_predict(
        job_display_name=job_display_name,
        gcs_source=gcs_source,
        gcs_destination_prefix=gcs_destination,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job
job_display_name = "batch_prediction_test"
gcs_source = f"gs://{bucket_name}/{bucket_dataset_dir}/batch_inference_inputs.jsonl"
gcs_destination = f"gs://{bucket_name}/{bucket_dataset_dir}/batch_inference"

batch_prediction_job = create_batch_prediction_job_sample(
    project_name, location, model_resource_name, job_display_name, gcs_source, gcs_destination
)
Creating BatchPredictionJob


INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob


BatchPredictionJob created. Resource name: projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200


To use this BatchPredictionJob in another session:


INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:


bpj = aiplatform.BatchPredictionJob('projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200')


INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200')


View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/europe-west4/batch-predictions/466893868839731200?project=**********


INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/europe-west4/batch-predictions/466893868839731200?project=**********


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_SUCCEEDED


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200 current state:
JobState.JOB_STATE_SUCCEEDED


BatchPredictionJob run completed. Resource name: projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob run completed. Resource name: projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200


batch_prediction_test
projects/**********/locations/europe-west4/batchPredictionJobs/466893868839731200
JobState.JOB_STATE_SUCCEEDED

The URL of the batch prediction pipeline output file in GCP can be infered from the given source destination folder given and the BatchPredictionJob output_info attribute. However once again, for the simplicity, we will assume that we will get the URL by finding the output file on the Google cloud storage at the given source destination folder in the bucket.

# To be found in the cloud storage at the previously given destination folder
output_gcp_url = "[URL of the prediction output]"

Import predictions to Kili

jsonl_file_path = "batch_prediction_output.jsonl"
requests.get(output_gcp_url, jsonl_file_path)

Now that the predictions output are retrieved, we will convert it to the Kili format given described in the Kili documentation and upload them to your Kili project.

def vertex_to_kili(json_output):
    external_id = json_output["instance"]["content"].split("/")[-1]
    annotations = []
    for category_name, bbox, confidence in zip(
        json_output["prediction"]["displayNames"],
        json_output["prediction"]["bboxes"],
        json_output["prediction"]["confidence"],
    ):
        bounding_poly = [
            {"x": bbox[0], "y": bbox[3]},
            {"x": bbox[0], "y": bbox[2]},
            {"x": bbox[1], "y": bbox[2]},
            {"x": bbox[1], "y": bbox[3]},
        ]
        annotations.append(
            {
                "boundingPoly": bounding_poly,
                "categories": [{"name": category_name.upper(), "confidence": confidence}],
            }
        )
    json_response = {"OBJECT_DETECTION_JOB": {"annotations": annotations}}
    return json_response, external_id
json_response_array = []
external_id_array = []

with open(jsonl_file_path) as jsonl_file:
    for line in jsonl_file:
        line = line.strip()
        json_output = json.loads(line)
        json_response, external_id = vertex_to_kili(json_output)
        json_response_array.append(json_response)
        external_id_array.append(external_id)
kili.create_predictions(
    project_id=project_id,
    json_response_array=json_response_array,
    external_id_array=external_id_array,
)

Results and conclusion

You can visualize your predictions in Kili and use them as preannotations for your project:

2023-10-30_12-57.png

We trained the model on a small training dataset and during a small amount of time but we are already able to get satisfying results.

In this tutorial, we have successfully demonstrated the synergy between Google Vertex AI AutoML and the Kili platform for efficient object detection model training and preannotation generation. By combining the strengths of Vertex AI and Kili, we have established a seamless workflow that optimizes the training process and fasten data annotation.

You are free to take this tutorial as an integration starting point and implement a more complex active learning pipeline.