Introduction - Building Blocks of GPT-2 LLM
Introduction
This lesson notes are designed to
Explore the architecture of LLMs with a special focus on GPT-2 model
Introduce fundamental components of LLM (i.e., building blocks of LLM architecture)
Provide understanding of basic LLM architecture
Objectives
Identify key components (building blocks) of LLM (Decoder-only LLMs)
Understand
why each building block of a LLM important ?
how these building blocks work ?
Understand the dataflow and data parallelization of LLMs
Here, learners will gain knowledge to interpret the processes that enable LLMs to predict the next word from a sequence of input words
Prerequisites
Prerequisites
Basic understanding of Deep learning concepts and methods
Python programming
Basic understanding of PyTorch implementation
Lesson notes (content)
Introduction to Large Language Models (LLMs)
GPT - Generative Pretrained Transformer model
Introduction to tokenization
Introduction to embedding
Transformer blocks
Self-attention mechanism
Masked Attention
Multi-head self-attention
Feedforward neural network (FNN)
Language Modeling Head (LM Head)
Pre-trained GPT-2 model end to end
Dataflow across LLM
Target audience
Lesson notes are for individuals with deep learning knowledge and want to have a basic overview of the architecture of LLMs. The notes are designed not to dive deep into each component but to interpret the underline processes of LLM’s key components.
Credits
Originally delivered as a two-part lecture in the monthly seminar series of Scientific Computing Services (RDE-SCS), University of Oslo
Subsequently expanded and adapted into these comprehensive lecture notes
GitHub pages are build via CodeRefinery sphinx-lesson-template