Leonie Monigatti
  • Blog
  • Portfolio

Blog

Building an AI agent from scratch in Python

How to implement a single AI agent with an LLM API and no frameworks.
Sep 30, 2025

First impressions from testing 4 Coding Agents with Jupyter Notebooks

How well do Claude Code and Gemini CLI with and without Cursor and Gemini from within Google Colab handle Jupyter Notebook workflows for teaching and experimentation?
Jul 28, 2025

37 Things I Learned About Information Retrieval in Two Years at a Vector Database Company

Reflections on what I’ve learned about information retrieval in the last two years working at Weaviate
Jul 3, 2025

Notes on NeoBERT

Jun 25, 2025

Who wrote this?

And why?
Jun 24, 2025

2024 in Review: What I Got Right, Where I Was Wrong, and Bolder Predictions for 2025

What I got right (and wrong) about trends in 2024 and daring to make bolder predictions for the year ahead
Dec 17, 2024

The Challenges of Retrieving and Evaluating Relevant Context for RAG

A case study with a grade 1 text understanding exercise for how to measure context relevance in your retrieval-augmented generation system using Ragas, TruLens, and DeepEval
Jun 10, 2024

Shifting Tides: The Competitive Edge of Open Source LLMs over Closed Source LLMs

Why I think smaller open source foundation models have already begun replacing proprietary models by providers, such as OpenAI, in Generative AI applications
Apr 29, 2024

Intro to DSPy: Goodbye Prompting, Hello Programming!

How the DSPy framework solves the fragility problem in LLM-based applications by replacing prompting with programming and compiling
Feb 27, 2024

Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation

How to address limitations of naive RAG pipelines by implementing targeted advanced RAG techniques in Python
Feb 19, 2024

2023 in Review: Recapping the Post-ChatGPT Era and What to Expect for 2024

How the LLMOps landscape has evolved and why we haven’t seen many Generative AI applications in the wild yet — but maybe in 2024.
Dec 18, 2023

Evaluating RAG Applications with RAGAs

A framework with metrics and LLM-generated data to evaluate the performance of your Retrieval-Augmented Generation pipeline
Dec 13, 2023

A Guide on 12 Tuning Strategies for Production-Ready RAG Applications

How to improve the performance of your Retrieval-Augmented Generation (RAG) pipeline with these “hyperparameters” and tuning strategies
Dec 6, 2023

Improving Retrieval Performance in RAG Pipelines with Hybrid Search

How to find more relevant search results by combining traditional keyword-based search with modern vector search
Nov 28, 2023

Recreating Amazon’s New Generative AI Feature: Product Review Summaries

How to generate summaries from data in your Weaviate vector database with an OpenAI LLM in Python using a concept called “Generative Feedback Loops”
Nov 21, 2023

Retrieval-Augmented Generation (RAG): From Theory to LangChain Implementation

From the theory of the original academic paper to its Python implementation with OpenAI, Weaviate, and LangChain
Nov 14, 2023

Recreating Andrej Karpathy’s Weekend Project — a Movie Search Engine

Building a movie recommender system with OpenAI embeddings and a vector database
Nov 7, 2023

Why OpenAI’s API Is More Expensive for Non-English Languages

Beyond words: How byte pair encoding and Unicode encoding factor into pricing disparities
Aug 16, 2023

Easily Estimate Your OpenAI API Costs with Tiktoken

Count your tokens and avoid going bankrupt from using the OpenAI API
Aug 1, 2023

Getting Started with Weaviate: A Beginner’s Guide to Search with Vector Databases

How to use vector databases for semantic search, question answering, and generative search in Python with OpenAI and Weaviate
Jul 18, 2023

Explaining Vector Databases in 3 Levels of Difficulty

From noob to expert: Demystifying vector databases across different backgrounds
Jul 4, 2023

Matplotlib Tips to Instantly Improve Your Data Visualizations — According to “Storytelling with Data”

Recreating lessons learned from Cole Nussbaumer Knaflic’s book in Python using Matplotlib
Jun 20, 2023

Boosting PyTorch Inference on CPU: From Post-Training Quantization to Multithreading

How to reduce inference time on CPU with clever model selection, post-training quantization with ONNX Runtime or OpenVINO, and multithreading with ThreadPoolExecutor
Jun 13, 2023

10 Exciting Project Ideas Using Large Language Models (LLMs) for Your Portfolio

Learn how to build apps and showcase your skills with large language models (LLMs). Get started today!
May 15, 2023

PyTorch Image Classification Tutorial for Beginners

Fine-tuning pre-trained Deep Learning models in Python
May 9, 2023

Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applications

A LangChain tutorial to build anything with large language models in Python
Apr 25, 2023

Cutout, Mixup, and Cutmix: Implementing Modern Image Augmentations in PyTorch

Data augmentation techniques for Computer Vision implemented in Python
Apr 14, 2023

Stationarity in Time Series — A Comprehensive Guide

How to check if a time series is stationary and what you can do if it is non-stationary in Python
Apr 11, 2023

How to Save and Load Your Neural Networks in Python

A complete guide to saving and loading checkpoints and entire Deep Learning models in PyTorch and TensorFlow/Keras
Apr 5, 2023

Audio Classification with Deep Learning in Python

Fine-tuning image models to tackle domain shift and class imbalance with PyTorch and torchaudio in audio data
Apr 4, 2023

Data Augmentation Techniques for Audio Data in Python

How to augment audio in waveform (time domain) and as spectrograms (frequency domain) with librosa, numpy, and PyTorch
Mar 28, 2023

2 Simple Steps To Reduce the Memory Usage of Your Pandas Dataframe

How to fit a large dataset into your RAM in Python
Mar 21, 2023

A Simple Approach to Hierarchical Time Series Forecasting with Machine Learning

How to “boost” your cyclical sales data forecast with LightGBM and Python
Mar 14, 2023

Beginner’s Guide to the Must-Know LightGBM Hyperparameters

The most important LightGBM parameters, what they do, and how to tune them
Mar 7, 2023

Building a Recommender System using Machine Learning

“Candidate rerank” approach with co-visitation matrix and GBDT ranker model in Python
Mar 1, 2023

Intermediate Deep Learning with Transfer Learning

A practical guide for fine-tuning Deep Learning models for computer vision and natural language processing
Feb 22, 2023

Pandas vs. Polars: A Syntax and Speed Comparison

Understanding the major differences between the Python libraries Pandas and Polars for Data Science
Jan 11, 2023

Will We Be Using ChatGPT Instead of Google To Get a Christmas Cookie Recipe Next Year?

Will ChatGPT replace search engines? A walkthrough with the use case of looking up a sugar cookie recipe
Dec 22, 2022

A Visual Guide to Learning Rate Schedulers in PyTorch

LR decay and annealing strategies for Deep Learning in Python
Dec 6, 2022

Kaggle Days Paris 2022

Discussing Data Science with Kagglers while eating macarons
Nov 22, 2022

How to Create a PDF Report for Your Data Analysis in Python

Automate PDF generation with the FPDF library as part of your data analysis
Oct 25, 2022

How to Create a GIF from Matplotlib Plots in Python

A data visualization technique for 2-dimensional time series data using imageio
Oct 18, 2022

A Collection of Must-Know Techniques for Working with Time Series Data in Python

How to manipulate and visualize time series data in datetime format with ease
Oct 12, 2022

How to Easily Customize SHAP Plots in Python

Adjust the colors and figure size and add titles and labels to SHAP plots
Oct 4, 2022

Everything You Need to Know About the Binary Search Algorithm

Master the Binary Search algorithm in 8 minutes
Sep 27, 2022

A Beginner’s Guide to Prompt Design for Text-to-Image Generative Models

Learn these prompt engineering tricks before you waste your free trial credits
Sep 20, 2022

Intermediate Data Analysis Techniques for Text Data

How to perform Exploratory Data Analysis on text data for Natural Language Processing
Sep 13, 2022

AI-Generated Art: How to Get Started with Generating Your Own Images

A non-technical comparison of DALL·E2, Midjourney, and Stable Diffusion
Sep 7, 2022

Fundamental Data Analysis Techniques for Text Data

EDA for NLP: From counts, lengths, and term frequencies to why you don’t need word clouds
Aug 31, 2022

Time Series Problems Simply Explained as Fast Food Combo Meals

The difference between univariate vs. multivariate, single-step vs. multistep, and sliding vs. expanding window time series problems
Aug 23, 2022

99 Lessons on Data Analysis from Placing Top 5 in 5 Kaggle Analytics Challenges

(Grand)Masterclass: How to approach (and win) a Kaggle Analytics Competition
Aug 16, 2022

Visualizing Part-of-Speech Tags with NLTK and SpaCy

Customizing displaCy’s entity visualizer
Aug 9, 2022

Interpreting ACF and PACF Plots for Time Series Forecasting

How to determine the order of AR and MA models
Aug 2, 2022

How to Handle Large Datasets in Python

A Comparison of CSV, Pickle, Parquet, Feather, and HDF5
Jul 26, 2022

How to Merge Pandas DataFrames

How to Avoid Losing Valuable Data Points (incl. Cheat Sheet)
Jul 20, 2022

Why Your Data Visualizations Should Be Colorblind-Friendly

Especially if You Are Trying to Convince Men
Jul 12, 2022

5 Ideas to Create New Features from Polygons

How to Get the Area and Other Features From a WKT String with Shapely
Jul 6, 2022

Essential Techniques to Style Pandas DataFrames

How to Effectively Communicate Data with Tables (including Cheat Sheet)
Jun 27, 2022
No matching items
    Back to top
    • Hi, I am Leonie, a machine learning engineer and technical writer. I help developers build vector-based AI solutions. My writing focuses on machine learning and AI engineering.
    • Copyright 2025, Leonie Monigatti
    • Imprint

    • Privacy Policy

    Cookie Consent