Feedforward neural network (FNN)

Objectives

Basic (simplest) overview

What is a single neuron?

alt text

Inputs: Always numerical, can have many inputs, often normalized
Weights: Numerical values associated with each input, determine importance
Bias: Provides flexibility for the neuron to activate even if all inputs are zero

What is a FNN?

alt text

Input layer: Receives the inputs, a neuron in a input layer represents a single input feature
Hidden layers: process the input data and extract complex features
Output layer: Produces the network’s prediction or output

FNN in the transformer block

alt text

FNN in the transformer block accepts context vector from the attention layer as the input
FNN expands this input into a much higher-dimensional space (often 4 times the size)
- Explore each context vector in a richer representation space
- i.e., Uncompresses the information within each context vector
Compressing it back down to model dimensions and generate enriched context vector

Perform complex calculations and feature extraction on each token individually in a richer representation space
Capture more nuanced and rich feature representations within each context vector
Attention mechanism figures out where to look (routing information between words), the feed-forward network figures out what the words actually mean

FNN contains significantly more trainable weights (parameters) than the self-attention layer
Account for the bulk of the model’s storage
Stores the generalized patterns the model learned during training
Acting as the main engine for computation within each transformer block