Skip to content

Neural Networks

Empirical Risk and Cross-Entropy in MicroTorch

In the previous chapter we prepared the MicroTorch - Deep Learning from Scratch. Now, it's time to dive into creating the loss functions that will guide our model during training. In this session, we're going to focus on building two fundamental loss functions: Binary Cross-Entropy (BCE) and Cross-Entropy (CE), using the Microtorch framework. These functions are essential for training models, especially for classification tasks, and I'll walk you through how to implement them from scratch.

Autograd cover

Medieval loss discovery

MicroTorch - Deep Learning from Scratch!

Implementing deep learning algorithms involves managing data flow in two directions: forward and backward. While the forward pass is typically straightforward, handling the backward pass can be more challenging. As discussed in previous posts, implementing backpropagation requires a strong grasp of calculus, and even minor mistakes can lead to significant issues.

Fortunately, modern frameworks like PyTorch simplify this process with autograd, an automatic differentiation system that dynamically computes gradients during training. This eliminates the need for manually deriving and coding gradient calculations, making development more efficient and less error-prone.

Now, let's build the backbone of such an algorithm - Tensor class!

Autograd cover

Build an autograd!

Mastering Neural Network - Linear Layer and SGD

The human brain remains one of the greatest mysteries, far more complex than anything else we know. It is the most complicated object in the universe that we know of. The underlying processes and the source of consciousness, as well as consciousness itself, remain unknown. Neural Nets are good for popularizing Deep Learning algorithms, but we can't say for sure what mechanism behind biological Neural Networks enables intelligence to arise.

Training result

Visualized Boundaries

Weight Initialization Methods in Neural Networks

Weight initialization is crucial in training neural networks, as it sets the starting point for optimization algorithms. The activation function applies a non-linear transformation in our network. Different activation functions serve different purposes. Choosing the right weight initialization and activation function is key to better neural network performance. Xavier initialization is ideal for Sigmoid or Tanh in feedforward networks. He initialization pairs well with ReLU for faster convergence, especially in CNNs. Matching these improves training efficiency and model performance.

Initialization methods comparison

Comparison of different initialization methods