Types of Learning
- Supervised: learn from labeled data (classification, regression).
- Unsupervised: find patterns in unlabeled data (clustering, dimensionality reduction).
- Reinforcement: learn through trial and error with rewards.
- Self-supervised: generate labels from unlabeled data (used in pre-training FMs).
- Semi-supervised: combine small labeled set with large unlabeled set.
Key Concepts
- Features: input variables used to make predictions.
- Labels: target output in supervised learning.
- Training set: data used to train the model.
- Validation set: data used to tune hyperparameters.
- Test set: data used to evaluate final model performance.
- Overfitting: model learns noise, performs poorly on new data.
- Underfitting: model is too simple, misses patterns.
Evaluation Metrics
- Accuracy: % of correct predictions (not great for imbalanced data).
- Precision: of predicted positives, how many are actually positive.
- Recall (Sensitivity): of actual positives, how many were correctly predicted.
- F1 Score: harmonic mean of precision and recall.
- AUC-ROC: model's ability to distinguish between classes.
- RMSE: root mean squared error for regression tasks.
Common Algorithms
- Linear/Logistic Regression: simple, interpretable baseline models.
- Decision Trees / Random Forests: tree-based, handle non-linear data.
- K-Means: unsupervised clustering into K groups.
- Neural Networks: multi-layer models for complex patterns.
- XGBoost: gradient boosting, often wins tabular data competitions.
Practice Machine Learning Questions
Put your knowledge to the test with practice questions.