Welcome to DataSanta¶

Open the Deep Learning Mystery: YouTube

Welcome to DataSanta, my digital diary where numbers dance, and algorithms whisper the secrets of the universe. Read more About This Blog

Here you can find me on platforms!

I believe in the power of knowledge, the magic of math, and the art of programming.

April 13, 2025
in Deep Learning, Machine Learning, Neural Networks
12 min read

Empirical Risk and Cross-Entropy in MicroTorch

In the previous chapter we prepared the MicroTorch - Deep Learning from Scratch. Now, it's time to dive into creating the loss functions that will guide our model during training. In this session, we're going to focus on building two fundamental loss functions: Binary Cross-Entropy (BCE) and Cross-Entropy (CE), using the Microtorch framework. These functions are essential for training models, especially for classification tasks, and I'll walk you through how to implement them from scratch.

Autograd cover — Medieval loss discovery

April 3, 2025
in Deep Learning, Machine Learning, Neural Networks
58 min read

MicroTorch - Deep Learning from Scratch!

Implementing deep learning algorithms involves managing data flow in two directions: forward and backward. While the forward pass is typically straightforward, handling the backward pass can be more challenging. As discussed in previous posts, implementing backpropagation requires a strong grasp of calculus, and even minor mistakes can lead to significant issues.

Fortunately, modern frameworks like PyTorch simplify this process with autograd, an automatic differentiation system that dynamically computes gradients during training. This eliminates the need for manually deriving and coding gradient calculations, making development more efficient and less error-prone.

Now, let's build the backbone of such an algorithm - Tensor class!

February 5, 2025
in Mathematics, Programming, Classification, Machine Learning, Deep Learning
11 min read

Classification - Cross-Entropy & Softmax

Fashion-MNIST is a dataset created by Zalando Research as a drop-in replacement for MNIST. It consists of 70,000 grayscale images (28×28 pixels) categorized into 10 different classes of clothing, such as shirts, sneakers, and coats. Your mission? Train a model to classify these fashion items correctly!

February 1, 2025
in Mathematics, Programming, Optimizations, Machine Learning, Deep Learning
7 min read

SGD, Momentum & Exploding Gradient

Gradient descent is fundamental method in training a deep learning network. It aims to minimize the loss function \(\mathcal{L}\) by updating model parameters in the direction that reduces the loss. By using only batch of the data we can compute the direction of the steepest descent. However, for large networks or more complicated challenges, this algorithm may not be successful! Let's find out why this happens and how we can fix this.

Training Failure: `SGD` can't classify the spiral pattern

January 28, 2025
in Deep Learning, Machine Learning, Neural Networks
9 min read

Solving Non-Linear Patterns with Deep Neural Network

The Perceptron, created by Frank Rosenblatt in the 1950s, was one of the first neural networks designed to classify patterns. Initially celebrated, it became a foundational milestone in machine learning.

Frank Rosenblatt and the Perceptron, a simple neural network machine designed to classify patterns