What Are Foundation Models?
- Large AI models pre-trained on massive, diverse datasets.
- Can be adapted to many downstream tasks without training from scratch.
- Examples: GPT, Claude, Llama, Titan, BERT, Stable Diffusion.
- Built using transformer architecture (most modern FMs).
- Types: text-to-text (LLMs), text-to-image, multi-modal.
Key Concepts
- Pre-training: initial training on large unlabeled data (self-supervised).
- Fine-tuning: further training on task-specific labeled data.
- Transfer learning: applying knowledge from pre-training to new tasks.
- Tokens: basic units of text processed by FMs (words, subwords, or characters).
- Context window: maximum number of tokens the model can process at once.
- Embeddings: dense vector representations of text capturing semantic meaning.
Inference Parameters
- Temperature: controls randomness (0 = deterministic, 1 = creative).
- Top-p (nucleus sampling): limits token selection to cumulative probability p.
- Top-k: limits selection to the k most probable next tokens.
- Max tokens: maximum length of the generated response.
- Stop sequences: tokens that signal the model to stop generating.
Customization Approaches
- Prompt engineering: craft input prompts to guide output (no model change).
- RAG: augment prompts with retrieved external data (no model change).
- Fine-tuning: train the model on custom data (changes model weights).
- Continued pre-training: extend the model's knowledge with domain-specific data.
- Cost and complexity: prompt engineering < RAG < fine-tuning < pre-training.
Practice Foundation Models Questions
Put your knowledge to the test with practice questions.