- Automobile Workflow
- Create a Notebook
- Download the data and notebook
- Upload Automobile pipeline notebook file
- Explore the ML code of the Automobile usecase
- Convert your notebook to a Katonic Pipeline
- Katonic Pipeline Dashboard
- Pipeline components execution
- Load the automobile dataset.
- Transforms raw data into meaningful information by doing data preprocessing.
- Perform feature engineering to create features.
- Storing created features in the feature store.
What we are going to build
The dataset involved here is publicly available at Automobile Dataset that predict the price of the automobiles given all the information in the dataset.
Navigate to the Notebook link on the Katonic central dashboard.
Click on Create Notebook
Make sure you have selected one of the image:
Select the CPU and Memory require:
Click Create to create the notebook.
When the notebook server is available, click Connect to connect to it.
A new tab will open up with the JupyterLab landing page. Create a new Terminal in JupyterLab.
In the Terminal window, run these commands and download the notebook and the data that you will use for the remainder of the lab.
Notegit clone https://github.com/katonic-dev/Examples.git
This repository contains a series of curated examples with data and annotated Notebooks. Navigate to the folder in the sidebar and open the notebook Automobile inside Examples/automobile/
Run the notebook step-by-step. Note that the code fails because a library is missing.
You can install the required libraries either by go to the Terminal and install the missing library or directly in the cell in the notebook.
Run the cell right above to install the missing libraries:
Restart the notebook kernel by clicking on the Refresh icon.
Enable Kale by clicking on the Kale slider in the Kale Deployment Panel (left pane of the notebook).
- Kale is a project that aims at simplifying the Data Science experience of deploying Pipelines workflows.
- Kale bridges this gap by providing a simple UI to define Kubeflow Pipelines workflows directly from you JupyterLab interface, without the need to change a single line of code.
- Kale was designed to address difficulties by providing a tool to simplify the deployment process of a Jupyter Notebook into Katonic Pipelines workflows. Translating Jupyter Notebook directly into a Katonic pipeline ensures that all the processing building blocks are well organized and independent from each other, while also leveraging on the experiment tracking and workflows organization.
- Kale takes as input the annotated Jupyter Notebook and generates a standalone Python script that defines the Katonic pipeline, based on the Notebook and Cells annotations.
Explore per-cell dependencies.
See how multiple notebook cells can be part of a single pipeline step, as indicated by color bars on the left of the cells, and how a pipeline step may depend on previous ones, as indicated by depends on labels above the cells. For example, the image below shows multiple cells that are part of the same pipeline step. They have the same brown color and they depend on a previous pipeline step named “load_data”.
Normally, you should create a new Docker image to be able run this notebook as a Katonic pipeline, to include the newly installed libraries.
Click Advanced Settings and add Docker image
- Docker is a tool for running applications in an isolated environment. It gives you advantages similar to running your applications inside a virtual machine.
- Docker gives you these advantages but without the overhead and hassle of running and managing a virtual machine instead we have containers, the code and the environment are all wrapped up inside a container but a container is not a full virtual machine.
- Docker uses special features of the UNIX file system to create these isolated environments.
- Images are defined using a docker file, a docker file is just a text file with a list of steps to perform to create that image. So, you write a docker file then you build that and you get an image which you can run to get containers.
Click the Volume access mode and select the mode.
- ReadOnlyMany - Read only by many node
- ReadWriteOnce - Read write by single node
- ReadWriteMany - Read write by many node
Click the Compile and Run button.
Watch the progress of Compiling Notebook.
Watch the progress of Running pipeline
Click the link to go to the Katonic Pipelines UI and view the run.
After clicking view, select the automobile experiment
Dropdown the experiment and select the latest pipeline which is created
Wait for it to complete.
Visualization of Automobile Data preprocessing Components
Visualization of Automobile Model Evaluation Components
Congratulations! You just ran an end-to-end Katonic Pipeline starting from your notebook!