Leonie Monigatti
Blog
Portfolio
Blog
Building an AI agent from scratch in Python
How to implement a single AI agent with an LLM API and no frameworks.
Sep 30, 2025
First impressions from testing 4 Coding Agents with Jupyter Notebooks
How well do Claude Code and Gemini CLI with and without Cursor and Gemini from within Google Colab handle Jupyter Notebook workflows for teaching and experimentation?
Jul 28, 2025
37 Things I Learned About Information Retrieval in Two Years at a Vector Database Company
Reflections on what I’ve learned about information retrieval in the last two years working at Weaviate
Jul 3, 2025
Notes on NeoBERT
Jun 25, 2025
Who wrote this?
And why?
Jun 24, 2025
2024 in Review: What I Got Right, Where I Was Wrong, and Bolder Predictions for 2025
What I got right (and wrong) about trends in 2024 and daring to make bolder predictions for the year ahead
Dec 17, 2024
The Challenges of Retrieving and Evaluating Relevant Context for RAG
A case study with a grade 1 text understanding exercise for how to measure context relevance in your retrieval-augmented generation system using Ragas, TruLens, and DeepEval
Jun 10, 2024
Shifting Tides: The Competitive Edge of Open Source LLMs over Closed Source LLMs
Why I think smaller open source foundation models have already begun replacing proprietary models by providers, such as OpenAI, in Generative AI applications
Apr 29, 2024
Intro to DSPy: Goodbye Prompting, Hello Programming!
How the DSPy framework solves the fragility problem in LLM-based applications by replacing prompting with programming and compiling
Feb 27, 2024
Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation
How to address limitations of naive RAG pipelines by implementing targeted advanced RAG techniques in Python
Feb 19, 2024
2023 in Review: Recapping the Post-ChatGPT Era and What to Expect for 2024
How the LLMOps landscape has evolved and why we haven’t seen many Generative AI applications in the wild yet — but maybe in 2024.
Dec 18, 2023
Evaluating RAG Applications with RAGAs
A framework with metrics and LLM-generated data to evaluate the performance of your Retrieval-Augmented Generation pipeline
Dec 13, 2023
A Guide on 12 Tuning Strategies for Production-Ready RAG Applications
How to improve the performance of your Retrieval-Augmented Generation (RAG) pipeline with these “hyperparameters” and tuning strategies
Dec 6, 2023
Improving Retrieval Performance in RAG Pipelines with Hybrid Search
How to find more relevant search results by combining traditional keyword-based search with modern vector search
Nov 28, 2023
Recreating Amazon’s New Generative AI Feature: Product Review Summaries
How to generate summaries from data in your Weaviate vector database with an OpenAI LLM in Python using a concept called “Generative Feedback Loops”
Nov 21, 2023
Retrieval-Augmented Generation (RAG): From Theory to LangChain Implementation
From the theory of the original academic paper to its Python implementation with OpenAI, Weaviate, and LangChain
Nov 14, 2023
Recreating Andrej Karpathy’s Weekend Project — a Movie Search Engine
Building a movie recommender system with OpenAI embeddings and a vector database
Nov 7, 2023
Why OpenAI’s API Is More Expensive for Non-English Languages
Beyond words: How byte pair encoding and Unicode encoding factor into pricing disparities
Aug 16, 2023
Easily Estimate Your OpenAI API Costs with Tiktoken
Count your tokens and avoid going bankrupt from using the OpenAI API
Aug 1, 2023
Getting Started with Weaviate: A Beginner’s Guide to Search with Vector Databases
How to use vector databases for semantic search, question answering, and generative search in Python with OpenAI and Weaviate
Jul 18, 2023
Explaining Vector Databases in 3 Levels of Difficulty
From noob to expert: Demystifying vector databases across different backgrounds
Jul 4, 2023
Matplotlib Tips to Instantly Improve Your Data Visualizations — According to “Storytelling with Data”
Recreating lessons learned from Cole Nussbaumer Knaflic’s book in Python using Matplotlib
Jun 20, 2023
Boosting PyTorch Inference on CPU: From Post-Training Quantization to Multithreading
How to reduce inference time on CPU with clever model selection, post-training quantization with ONNX Runtime or OpenVINO, and multithreading with ThreadPoolExecutor
Jun 13, 2023
10 Exciting Project Ideas Using Large Language Models (LLMs) for Your Portfolio
Learn how to build apps and showcase your skills with large language models (LLMs). Get started today!
May 15, 2023
PyTorch Image Classification Tutorial for Beginners
Fine-tuning pre-trained Deep Learning models in Python
May 9, 2023
Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applications
A LangChain tutorial to build anything with large language models in Python
Apr 25, 2023
Cutout, Mixup, and Cutmix: Implementing Modern Image Augmentations in PyTorch
Data augmentation techniques for Computer Vision implemented in Python
Apr 14, 2023
Stationarity in Time Series — A Comprehensive Guide
How to check if a time series is stationary and what you can do if it is non-stationary in Python
Apr 11, 2023
How to Save and Load Your Neural Networks in Python
A complete guide to saving and loading checkpoints and entire Deep Learning models in PyTorch and TensorFlow/Keras
Apr 5, 2023
Audio Classification with Deep Learning in Python
Fine-tuning image models to tackle domain shift and class imbalance with PyTorch and torchaudio in audio data
Apr 4, 2023
Data Augmentation Techniques for Audio Data in Python
How to augment audio in waveform (time domain) and as spectrograms (frequency domain) with librosa, numpy, and PyTorch
Mar 28, 2023
2 Simple Steps To Reduce the Memory Usage of Your Pandas Dataframe
How to fit a large dataset into your RAM in Python
Mar 21, 2023
A Simple Approach to Hierarchical Time Series Forecasting with Machine Learning
How to “boost” your cyclical sales data forecast with LightGBM and Python
Mar 14, 2023
Beginner’s Guide to the Must-Know LightGBM Hyperparameters
The most important LightGBM parameters, what they do, and how to tune them
Mar 7, 2023
Building a Recommender System using Machine Learning
“Candidate rerank” approach with co-visitation matrix and GBDT ranker model in Python
Mar 1, 2023
Intermediate Deep Learning with Transfer Learning
A practical guide for fine-tuning Deep Learning models for computer vision and natural language processing
Feb 22, 2023
Pandas vs. Polars: A Syntax and Speed Comparison
Understanding the major differences between the Python libraries Pandas and Polars for Data Science
Jan 11, 2023
Will We Be Using ChatGPT Instead of Google To Get a Christmas Cookie Recipe Next Year?
Will ChatGPT replace search engines? A walkthrough with the use case of looking up a sugar cookie recipe
Dec 22, 2022
A Visual Guide to Learning Rate Schedulers in PyTorch
LR decay and annealing strategies for Deep Learning in Python
Dec 6, 2022
Kaggle Days Paris 2022
Discussing Data Science with Kagglers while eating macarons
Nov 22, 2022
How to Create a PDF Report for Your Data Analysis in Python
Automate PDF generation with the FPDF library as part of your data analysis
Oct 25, 2022
How to Create a GIF from Matplotlib Plots in Python
A data visualization technique for 2-dimensional time series data using imageio
Oct 18, 2022
A Collection of Must-Know Techniques for Working with Time Series Data in Python
How to manipulate and visualize time series data in datetime format with ease
Oct 12, 2022
How to Easily Customize SHAP Plots in Python
Adjust the colors and figure size and add titles and labels to SHAP plots
Oct 4, 2022
Everything You Need to Know About the Binary Search Algorithm
Master the Binary Search algorithm in 8 minutes
Sep 27, 2022
A Beginner’s Guide to Prompt Design for Text-to-Image Generative Models
Learn these prompt engineering tricks before you waste your free trial credits
Sep 20, 2022
Intermediate Data Analysis Techniques for Text Data
How to perform Exploratory Data Analysis on text data for Natural Language Processing
Sep 13, 2022
AI-Generated Art: How to Get Started with Generating Your Own Images
A non-technical comparison of DALL·E2, Midjourney, and Stable Diffusion
Sep 7, 2022
Fundamental Data Analysis Techniques for Text Data
EDA for NLP: From counts, lengths, and term frequencies to why you don’t need word clouds
Aug 31, 2022
Time Series Problems Simply Explained as Fast Food Combo Meals
The difference between univariate vs. multivariate, single-step vs. multistep, and sliding vs. expanding window time series problems
Aug 23, 2022
99 Lessons on Data Analysis from Placing Top 5 in 5 Kaggle Analytics Challenges
(Grand)Masterclass: How to approach (and win) a Kaggle Analytics Competition
Aug 16, 2022
Visualizing Part-of-Speech Tags with NLTK and SpaCy
Customizing displaCy’s entity visualizer
Aug 9, 2022
Interpreting ACF and PACF Plots for Time Series Forecasting
How to determine the order of AR and MA models
Aug 2, 2022
How to Handle Large Datasets in Python
A Comparison of CSV, Pickle, Parquet, Feather, and HDF5
Jul 26, 2022
How to Merge Pandas DataFrames
How to Avoid Losing Valuable Data Points (incl. Cheat Sheet)
Jul 20, 2022
Why Your Data Visualizations Should Be Colorblind-Friendly
Especially if You Are Trying to Convince Men
Jul 12, 2022
5 Ideas to Create New Features from Polygons
How to Get the Area and Other Features From a WKT String with Shapely
Jul 6, 2022
Essential Techniques to Style Pandas DataFrames
How to Effectively Communicate Data with Tables (including Cheat Sheet)
Jun 27, 2022
No matching items
Back to top