Skip to main content
Version: 1.0.0

Iris


  1. Iris Workflow
    • Dataset
  2. Create a Notebook
  3. Download the data and notebook
    • Upload Iris pipeline notebook file
  4. Explore the ML code of the Iris usecase
  5. Convert your notebook to a Katonic Pipeline
  6. Katonic Pipeline Dashboard
  7. Pipeline components execution

Iris Workflow#

  • Load the Iris dataset.
  • Transforms raw data into meaningful information by doing data preprocessing.
  • Storing created features in the feature store.

What we are going to build

  • Iris Workflow

Dataset#

The dataset involved here is publicly available at Iris Dataset that predict the species of the iris flower, given all the information in the dataset.

Create a Notebook#

Navigate to the Notebook link on the Katonic central dashboard.

Click on Create Notebook

Make sure you have selected one of the image:

Select the CPU and Memory require:

Click Create to create the notebook.

When the notebook server is available, click Connect to connect to it.

Download the data and notebook#

A new tab will open up with the JupyterLab landing page. Create a new Terminal in JupyterLab.

Upload Iris pipeline notebook file#

In the Terminal window, run these commands and download the notebook and the data that you will use for the remainder of the lab.

Note

git clone https://github.com/katonic-dev/Examples.git

This repository contains a series of curated examples with data and annotated Notebooks. Navigate to the folder in the sidebar and open the notebook Iris inside Examples/iris/

Explore the ML code of the Iris usecase#

Run the notebook step-by-step. Note that the code fails because a library is missing.

You can install the required libraries either by go to the Terminal and install the missing library or directly in the cell in the notebook.

Run the cell right above to install the missing libraries:

Restart the notebook kernel by clicking on the Refresh icon.

Convert your notebook to a Katonic Pipeline#

Enable Kale by clicking on the Kale slider in the Kale Deployment Panel (left pane of the notebook).

Kale

  • Kale is a project that aims at simplifying the Data Science experience of deploying Pipelines workflows.
  • Kale bridges this gap by providing a simple UI to define Kubeflow Pipelines workflows directly from you JupyterLab interface, without the need to change a single line of code.
  • Kale was designed to address difficulties by providing a tool to simplify the deployment process of a Jupyter Notebook into Katonic Pipelines workflows. Translating Jupyter Notebook directly into a Katonic pipeline ensures that all the processing building blocks are well organized and independent from each other, while also leveraging on the experiment tracking and workflows organization.
  • Kale takes as input the annotated Jupyter Notebook and generates a standalone Python script that defines the Katonic pipeline, based on the Notebook and Cells annotations.

Explore per-cell dependencies.

See how multiple notebook cells can be part of a single pipeline step, as indicated by color bars on the left of the cells, and how a pipeline step may depend on previous ones, as indicated by depends on labels above the cells. For example, the image below shows multiple cells that are part of the same pipeline step. They have the same brown color and they depend on a previous pipeline step named “load_data”.

Normally, you should create a new Docker image to be able run this notebook as a Katonic pipeline, to include the newly installed libraries.

Click Advanced Settings and add Docker image

Docker image:

  • Docker is a tool for running applications in an isolated environment. It gives you advantages similar to running your applications inside a virtual machine.
  • Docker gives you these advantages but without the overhead and hassle of running and managing a virtual machine instead we have containers, the code and the environment are all wrapped up inside a container but a container is not a full virtual machine.
  • Docker uses special features of the UNIX file system to create these isolated environments.
  • Images are defined using a docker file, a docker file is just a text file with a list of steps to perform to create that image. So, you write a docker file then you build that and you get an image which you can run to get containers.

Click the Volume access mode and select the mode.

  • ReadOnlyMany - Read only by many node
  • ReadWriteOnce - Read write by single node
  • ReadWriteMany - Read write by many node

Click the Compile and Run button.

Watch the progress of Compiling Notebook.

Watch the progress of Running pipeline

Click the link to go to the Katonic Pipelines UI and view the run.

Katonic Pipeline Dashboard#

After clicking view, select the Iris experiment

Dropdown the experiment and select the latest pipeline which is created

Wait for it to complete.

Pipeline components execution#

Visualization of Iris Load data preprocessing Vomponent

Logs of Iris Model Evaluation Component

Congratulations! You just ran an end-to-end Katonic Pipeline starting from your notebook!