Skip to main content
Version: 1.0.0

E-Commerce


  1. E-Commerce Workflow
    • Dataset
  2. Create a Notebook
  3. Download the data and notebook
    • Upload E-Commerce pipeline notebook file
  4. Explore the ML code of the E-Commerce usecase
  5. Convert your notebook to a Katonic Pipeline
  6. Katonic Pipeline Dashboard
  7. Pipeline components execution

E-Commerce Workflow#

  • Load the e_commerce dataset.
  • Performing some Data profiling on the dataset.
  • After that, explore and visualize the data by doing some exploratory data analysis.
  • Transforms raw data into meaningful information by doing data preprocessing.
  • Perform feature engineering to create features.
  • Splitting the data into training and testing sets.
  • X and Y axis labels for graphs or figures.
  • Training the different models
  • Evaluating the different modes and choosing the best model among them.

What we are going to build

  • E-Commerce Workflow

Dataset#

The dataset involved here is publicly available at E-Commerce Dataset that predict the price of the E-Commerces given all the information in the dataset.

Create a Notebook#

Navigate to the Notebook link on the Katonic central dashboard.

Click on Create Notebook

Make sure you have selected one of the image:

Select the CPU and Memory require:

Click Create to create the notebook.

When the notebook server is available, click Connect to connect to it.

Download the data and notebook#

A new tab will open up with the JupyterLab landing page. Create a new Terminal in JupyterLab.

Upload E-Commerce pipeline notebook file#

In the Terminal window, run these commands and download the notebook and the data that you will use for the remainder of the lab.

Note

git clone https://github.com/katonic-dev/Examples.git

This repository contains a series of curated examples with data and annotated Notebooks. Navigate to the folder in the sidebar and open the notebook E-Commerce inside Examples/e-commerce/

Explore the ML code of the E-Commerce usecase#

Run the notebook step-by-step. Note that the code fails because a library is missing.

You can install the required libraries either by go to the Terminal and install the missing library or directly in the cell in the notebook.

Run the cell right above to install the missing libraries:

Restart the notebook kernel by clicking on the Refresh icon.

Convert your notebook to a Katonic Pipeline#

Enable Kale by clicking on the Kale slider in the Kale Deployment Panel (left pane of the notebook).

Kale

  • Kale is a project that aims at simplifying the Data Science experience of deploying Pipelines workflows.
  • Kale bridges this gap by providing a simple UI to define Kubeflow Pipelines workflows directly from you JupyterLab interface, without the need to change a single line of code.
  • Kale was designed to address difficulties by providing a tool to simplify the deployment process of a Jupyter Notebook into Katonic Pipelines workflows. Translating Jupyter Notebook directly into a Katonic pipeline ensures that all the processing building blocks are well organized and independent from each other, while also leveraging on the experiment tracking and workflows organization.
  • Kale takes as input the annotated Jupyter Notebook and generates a standalone Python script that defines the Katonic pipeline, based on the Notebook and Cells annotations.

Explore per-cell dependencies.

See how multiple notebook cells can be part of a single pipeline step, as indicated by color bars on the left of the cells, and how a pipeline step may depend on previous ones, as indicated by depends on labels above the cells. For example, the image below shows multiple cells that are part of the same pipeline step. They have the same brown color and they depend on a previous pipeline step named “load_data”.

Normally, you should create a new Docker image to be able run this notebook as a Katonic pipeline, to include the newly installed libraries.

Click Advanced Settings and add Docker image

Docker image:

  • Docker is a tool for running applications in an isolated environment. It gives you advantages similar to running your applications inside a virtual machine.
  • Docker gives you these advantages but without the overhead and hassle of running and managing a virtual machine instead we have containers, the code and the environment are all wrapped up inside a container but a container is not a full virtual machine.
  • Docker uses special features of the UNIX file system to create these isolated environments.
  • Images are defined using a docker file, a docker file is just a text file with a list of steps to perform to create that image. So, you write a docker file then you build that and you get an image which you can run to get containers.

Click the Volume access mode and select the mode.

  • ReadOnlyMany - Read only by many node
  • ReadWriteOnce - Read write by single node
  • ReadWriteMany - Read write by many node

Click the Compile and Run button.

Watch the progress of Compiling Notebook.

Watch the progress of Running pipeline

Click the link to go to the Katonic Pipelines UI and view the run.

Katonic Pipeline Dashboard#

After clicking view, select the E-Commerce experiment

Dropdown the experiment and select the latest pipeline which is created

Wait for it to complete.

Pipeline components execution#

Visualization of E-Commerce Load data Components

Visualization of E_commerce Data profiling Components

Visualization of e_commerce exploratory data analysis Components

Visualization of E-commerce Data preprocessing Components

Visualization of e_commerce Train test split Components

Similarly you can see the visualizations and logs for other containers as well

Congratulations! You just ran an end-to-end Katonic Pipeline starting from your notebook!