Feedforward neural network (FNN)

Objectives

  • Understand the role of FNN in a transformer block and why it is important

Basic (simplest) overview

What is a single neuron?

alt text

  • Inputs: Always numerical, can have many inputs, often normalized

  • Weights: Numerical values associated with each input, determine importance

  • Bias: Provides flexibility for the neuron to activate even if all inputs are zero

What is a FNN?

alt text

  • Input layer: Receives the inputs, a neuron in a input layer represents a single input feature

  • Hidden layers: process the input data and extract complex features

  • Output layer: Produces the network’s prediction or output

FNN in the transformer block

alt text

  • FNN in the transformer block accepts context vector from the attention layer as the input

  • FNN expands this input into a much higher-dimensional space (often 4 times the size)

    • Explore each context vector in a richer representation space

    • i.e., Uncompresses the information within each context vector

  • Compressing it back down to model dimensions and generate enriched context vector

Why FNN is important?

  • Perform complex calculations and feature extraction on each token individually in a richer representation space

  • Capture more nuanced and rich feature representations within each context vector

  • Attention mechanism figures out where to look (routing information between words), the feed-forward network figures out what the words actually mean

Main features

  • FNN contains significantly more trainable weights (parameters) than the self-attention layer

  • Account for the bulk of the model’s storage

  • Stores the generalized patterns the model learned during training

  • Acting as the main engine for computation within each transformer block