Image Compression with K-Means Clustering

Introduction

We live in an analog universe. Signals like pressure, temperature, voltage, etc., are continuous in nature. However, in the world of digital computing, signals are discrete. To use analog signals, we sample, hold, and then quantize them to the nearest predetermined fixed values. This process results in some loss of information, but these details are often so minute that we humans may not be able to discern the difference.

Certain applications of digital computing require more speed than accuracy. An example is multiplay gaming over the Internet. One would rather enjoy the game with less image quality than having it in high definition (HD) and enduring chopy gameplay. In these applications signals like image quality are compressed before they are sent over networks. Image compression algorithms could either be lossy or lossless. Lossy algorithms permanently discard details from the images while lossless maniplulate data such that the original version can be recovered. The goal of lossy algorithms is to compress the data without much loss in the quality of images.

K-Means is a poplular clustering algorithm. It produces a number of clusters, $k$, from $n$ observations by assigning observations to centroids and then moving the centroids to the mean of all of the observations in their cluster. Centroids are initialized at the beginning of the algorithm. Images are usually represented as pixels with each having three values with values representing different colors eg. red, green and blue. Technically as an image compression algorithm, $k$-means is quantizing vectors of $(R,G,B)$ in images into $k$ vectors (colors). So when $k$ is $2$ we will have two colors.

Implementation

The implementation below uses an argument parser to collect the path of the image, the name of the output file and also the number of colors to use (k).The function,compress_image, does the actual compression. It does so by first loading the image into a tensor with shape (Height,Width,Channels). In grayscale images the channels will be 1 while colored images has 3 channels. The channel data is the feature we wanto to quantize so the tensor is reshaped into (Height*Width, Channels) and then fed into the $k$-means algorithm. We used sklearn's implementation. The objective here is the application of the algorithm rather than its implementation so we used an existing implementation. Once all pixels are grouped into clusters we obtain the centroid values and use the cluster id as index to get the actual color values. Finally the data is reshaped into the original image and then saved. To compress an image to use 16 colors execute python main.py -i input.jpeg -o output.jpeg -k 16.

# filename: main.py
import numpy as np
import matplotlib.pyplot as plt
import argparse 
from typing import Tuple, List
from sklearn.cluster import KMeans

def compress_image(in_filename: str,out_filename:str, k : int ):
    img = plt.imread(in_filename)
    img_shape = img.shape
    X = img.reshape((-1,3))
    kmeans = KMeans(n_clusters=k,n_init='auto')
    cluster_id = kmeans.fit_predict(X)
    compressed_img = kmeans.cluster_centers_[cluster_id].reshape(img_shape).astype(np.uint8)
    plt.imsave(out_filename, compressed_img)

def main():
    arg_parser = argparse.ArgumentParser(prog='kmeans-compress-image', )
    arg_parser.add_argument('-i', required=True, help='input image file')
    arg_parser.add_argument('-o', required=True, help='output image file')
    arg_parser.add_argument('-k', default=64, type=int, help='number of clusters(colors)')
    arguments = arg_parser.parse_args()
    in_filename, out_filename, k = arguments.i, arguments.o, arguments.k
    print(f"compressing {in_filename} to {out_filename} using {k} colors")
    compress_image(in_filename, out_filename, k )
    
if __name__ == '__main__':
    main()

Result

It can be observed that we are able to compress the image while maintaing overall visual appeal. The compression ratio is calculated as $ratio = \frac{\text{k size in bytes}}{\text{original size in bytes}}$.

$K$OutputSize(bytes)Compresion Ratio
21983550.60
642437010.75
Original325860N/A

References