Projects

projects/processing-a-billion-zeek-connection-events-in-rust/processing-a-billion-zeek-connection-events-in-rust.png

Building a Bounded-Memory Pipeline for 944 Million Zeek Events in Rust

This article explores the design and implementation of a high-performance Rust pipeline that processed nearly one billion Zeek connection records while maintaining bounded memory usage. Starting from a simple NDJSON-to-CSV conversion task, the project evolved into a practical study of streaming architectures, backpressure, and large-scale log processing. Along the way, it highlights lessons learned about thread pools, bounded queues, and why scalability is often more about system design than raw speed. The resulting solution processed 944 million records with zero errors using only 16 worker threads and a small bounded queue. The techniques discussed form a strong foundation for building larger SOC, SIEM, and network telemetry ingestion systems in Rust.
projects/exporing-ghana-housing-rent/exporing-ghana-housing-rent.jpeg

Exploring Ghana's Rent Data

This article explores the factors influencing rental prices in Ghana's real estate market using data sourced from a popular rental website. By analyzing features such as the number of bedrooms, amenities, and property condition, the article provides insights for investors to add value to their properties and helps renters make informed decisions. The analysis employs techniques like Principal Component Analysis (PCA) to visualize the relationships between different features and identify dominant factors affecting rental prices.
projects/spam-filtering-with-naive-bayes/spam-filtering-with-naive-bayes.jpeg

Spam Filtering, Bayesian Approach

This article explains the implementation of a Naive Bayes spam filter, using Bayes' theorem to classify messages as spam or ham. It covers the algorithm’s theoretical foundation, the dataset and implementation details, and the results, which show over 90% accuracy with minimal computational resources. The article also discusses limitations, such as the assumption of word independence and the challenges of imbalanced data, and suggests potential improvements.
projects/specializing-large-language-models-for-telecom-networks/specializing-large-language-models-for-telecom-networks.jpeg

Specializing Large Language Models for Telecom Networks

This article describes a project I worked on with a colleague in the industry. We specialized a large language model to answering telecommunication engineeering multiple choice questions. We used a retrieval augumented generation system for that purpose. Our approach is very simple and very common however we decided to share what we did anyways.
projects/compressing-images-with-kmeans-clustering/kmeans-image-compression.jpeg

Image Compression with K-Means Clustering

This describes how the unsupervised machine learning algorithm, K-means clustering, can be used to compress images without losing much quality.