Comprehensive Technical Machine Learning Topics
Category: Technical
<!-- gdoc-inlined -->
Machine Learning
- Algorithms
- Linear Regression
- Derivation of Normal Form Equations / Gradient
- L1 / L2 / Elastic Net Regularization
- Bayesian Linear Regression
- Logistic Regression
- Derivation from probability theory
- Derivation of Gradient, Hessian
- Multiclass vs. Binary logistic regression
- L1 / L2 / Elastic Neit Regularization
- Bayesian Logistic Regression
- Laplace Approximation
- Discriminant Functions
- Linear Discriminant Analysis
- Fisher’s linear discriminant
- The Perceptron Algorithm
- Neural Networks
- Chain rule, Backprop derivation
- Feedforward Neural Networks
- Layer Types
- Activations
- Softmax
- Convolutional Neural Networks
- Convolution Operation
- Pooling
- Assumptions & Properties
- Invariance to location of features (Equivariant to translation)
- Weight Sharing for efficiency
- Sparse weights / Sparse connectivity
- Data Locality (Infinitely strong prior on local interactions mattering)
- Recurrent Neural Networks
- Structure
- Word2Vec / Embeddings
- Autoencoders
- Batch Normalization
- Regularization
- Early Stopping
- Weight Decay
- Dropout
- Data Augmentation
- Bayesian Neural Networks
- Decision Trees
- Random Forests
- Properties around bias-variance tradeoff
- Ensemble Modeling
- Extremely Random Forests
- Gradient Boosting
- Nature of Boosting
- Feature Importance
- Regularization
- Pruning
- Bootstrap
- Randomness
- Decision Trees for Regression
- Random Forests
- Maximum Margin Classifiers
- Support Vector Machine
- Kernel Trick
- For Regression
- Optimizers
- Quadratic Programming
- Pegasus
- Support Vector Machine
- Naive Bayes
- KNN
- Splines and Piecewise Polynomials
- MARS
- Principal Component Analysis
- Maximum variance formulation
- Minimum error formulation
- Probabilistic PCA
- Kernel PCA
- Clustering
- t
- Generative vs. Discriminative Models
- Gaussian Processes
- Variational Inference
- MCMC
- Markov Chains
- Metropolis-Hastings
- Gibbs Sampling
- Graphical Models
- Bayesian Networks
- Markov Random Fields
- Inference over Graphical Models
- Optimizers
- Gradient Descent
- Momentum
- Nesterov Momentum
- RMSProp
- Adadelta
- Adagrad
- Adam
- Momentum
- Newton’s Method
- Newton-Rhapson
- Trust Region Methods
- Trust Newton
- BFGS
- L-BFGS
- Iteratively Reweighted Least Squares
- Conjugate Gradients
- Coordinate Descent
- Line Search
- Expectation Maximization
- Lloyd’s Algorithm
- Evolutionary Strategies
- Finite Differences
- Convex Optimization
1. Linear Programing
- Simplex Method
- Branch, Bound and Cut
- Interior-Point Methods
- Simplex Method
2. Quadratic Programming 1. Karush-Kuhn-Tucker (KKT) System- Forwards Backwards Algorithm
- Matrix Factorization 1. LU Decomposition 2. QR Factorization 3. Cholesky Decomposition 4. Singular Value Decomposition 5. Non-Negative Matrix Factorization?
- MCMC 1. Metropolis Hastings 2. Gibbs Sampling
- CART
- Constrained Optimization 1. Lagrange Multiplier Constrained Optimization 2. Penalty Methods
- Automatic Differentiation
- Closed Form Solutions 1. Normal Form Equations
- Linear Regression
- Evaluation
- Loss Functions
- KL Divergence
- Cross Entropy
- Negative Log Lilkelihood
- Validation
- Cross Validation
- Leave one out
- Situations where valuable
- Proper validation methodology
- Cross Validation
- Regression
- RMSE
- MAE
- Median
- R2
- Visualization (Especially of large errors)
- Classification
- ROC Curve
- Confusion Matrix
- F1 Score
- Heat Map
- Overall accuracy rate
- Kappa Statistic
- Sensitivity
- Specificity
- AUC
- Visualization (Especially of errors)
- Loss Functions
- Testing
- t-test
- F-test
- A-B Testing
- Conceptual
- Bias-Variance Tradeoff
- Curse of Dimensionality
- Parametric vs. Non-Parametric Algorithms
- Softmax and its properties
- Classification Class Imbalance
- Model Tuning (Tune Parameters For Sensitivity)
- Alternate Cutoffs (Using ROC Curve)
- Adjusting Prior Probability
- Unequal Class Weights
- Down Sampling
- Up Sampling
- Alter Cost Function
- Dynamic Structure (Cascade of classifiers)
- Hyperparameter Tuning
- Cross Validation
- Bootstrap
- Grid Search
- Random Search
- Bayesian Optimization
- Procedural Data Science
- Preprocessing
- Exploratory Data Analysis
- Feature Evaluation
- Coefficients in Linear Models
- Random Forest Importances (variance for regression, information gain for classification)
- Pearson Correlation with Outcome
- Maximal Information Coefficient (MIC)
- Distance Correlation (code)
- Model with/without feature
- Randomly shuffle the feature between data points, check difference in model quality
- Lasso Automatic Selection
- Mean Decrease Accuracy (code)
- Stability Selection
- Recursive Feature Elimination
Computer Science
- Data Structures
- Hash Table
- Linked List
- Graphs
- Adjacency List
- Adjacency Matrix
- Pointers and Objects
- Heap
- Fibonacci Heaps
- Priority Queue
- Binary Tree
- Binary Search Tree
- Balanced Binary Search Tree
- Red Black Trees
- AVL Trees
- Balanced Binary Search Tree
- Queue
- Stack
- Dequeue
- Arrays
- Disjoint Set
- Algorithms (Non-ML)
- Graph Algorithms
- Shortest Path
- Dijkstra
- Floyd-Warshall
- Bellman-Ford
- A*
- Search
- BFS
- DFS
- Topological Sort
- Strongly Connected Components
- Minimum spanning Tree
- Kruskal
- Prim
- Min-Cut / Max-Flow
- Ford-Fulkerson
- Maximum bipartite matching
- Shortest Path
- Recursion
- Fibonacci
- Dynamic Programming / Recursion
- Knapsack
- Traveling Salesman Problem
- Longest Common Subsequence
- Rod Cutting
- Matrix-chain multiplication
- Optimal Binary Search Trees
- Divide and Conquer
- Maximum-subarray
- Strassen's Algorithm
- Greedy
- Huffman Codes
- Matroids
- Task-scheduling
- Sorting
- O(n log(n))
- MergeSort
- Quicksort
- Heapsort
- O(n)
- Radix Sort
- Bucket Sort
- O(n2)
- Insertion Sort
- Selection Sort
- Bubble Sort
- Shell Sort
- O(n log(n))
- Multithreaded
- Multithreaded Matrix Multiplication
- Multithreaded MergeSort
- Linear Programming
- Simplex
- Branch, Bound and Cut
- Fast Fourier Transform
- String Matching
- Rabin-Karp
- Knuth-Morris-Pratt
- NP Completeness
- Graph Algorithms
- Testing
- Programming Languages
- Functions
- Passing arguments by value / reference
- Main: Handling command-line options
- Return types and the return statement
- Overloading (Differences in the input parameters determine the function called)
- Polymorphism (Different behavior depending on class / type)
- Default Arguments
- Types
- Variables
- Val vs. Var
- Expressions
- Order of Evaluation
- Logical and Relational Operators
- Assignment
- Increment / Decrement Operators
- Conditionals
- Type Conversions
- Implicit / Explicit Conversions
- Scope
- Constants
- Pointers, Arrays, References
- Compilation
- Namespaces
- Error Handling
- Regular Expressions
- Iterators
- Predicates
- Resource Management
- Garbage Collection vs. Reference Counting
- Functions
- Concurrency
- Tasks and Threads
- Passing Arguments
- Sharing Data
- Waiting for Events
- Communication Tasks
- Object Oriented Programming
- Classes (c++)
- Concrete Types
- Abstract Types
- Virtual Functions (Polymorphism)
- Class Hierarchies
- Copy and Move
- Constructor
- Objects (Instances of classes, determining their type)
- Mixins
- Inheritance
- Data structure framing of programming rather than logic / action based framing
- Immutable State
- Classes (c++)
- Distributed Computing
- Map-Reduce
- In-Memory Compute
- How to parallelize algorithms
- Memory Workings & Optimization
- Pointers
- Bits
- Bit Manipulation
- System Design
Mathematics
- Differentiation
- Limits & Limit Rule
- Partial Differentiation
- Chain Rule over several variables
- Chain Rule
- Product Rule
- Quotient Rule
- Logarithmic Differentiation
- Gradient Computations
- Jacobian
- Hessian
- Newton’s Method
- Convexity
- Critical Points
- Lagrangian Multipliers
- Integration
- U-substitution (inverse chain rule)
- Integration by parts (inverse product rule)
- Multiple Integration
- Functions
- Exponential Functions
- Exponential Manipulation Rules
- Logarithm Functions
- Logarithm Manipulation Rules
- Series
- Convergence / Divergence
- Special Series
- Power Series
- Taylor Series
- Functions of Several Variables
- Vector Functions
- Calculus over Vector Functions
- Exponential Functions
- Sequences and Series
- Taylor Series Approximation
- Summation Manipulation
- Linear Algebra
- Vector Norms
- Projection
- Important Matrices
- Diagonal Matrices
- Positive Semi-definite Matrices
- Conjugate Matrices
- Triangular matrices
- Symmetric Matrices
- Orthogonal Matrices
- Inversion
- Trace
- Matrix Factorization
- LU Decomposition
- QR Factorization
- Cholesky Decomposition
- Singular Value Decomposition
- Non-Negative Matrix Factorization
- Gram-Schmidt
- Matrix Multiplication
- Vector Spaces
- Linear Independence
- Basis
- Linear Transformations
- Determinants
- Eigenvalues
- Eigenvectors
- Positive Definiteness, tests
- Pseudoinverses
- Cross Product
Probability Theory
- Expectation, Mean, Variance
- Conditional Probability
- Bayes Rule
- Sum and Product Rule
- Independence
- Covariance, Correlation
- Probability Mass Function, Probability Density Function, Cumulative Distribution Function
- Distributions
- Discrete (Probability Masses)
- Binomial
- Bernoulli
- Multinomial
- Poisson
- Continuous (Probability Densities)
- Gaussian
- Conditional Gaussian
- Marginal Gaussian
- Mixtures of Gaussians
- Student’s T-distribution
- Beta
- Gaussian
- Exponential Family
- Maximum likelihood for exponentials
- Conjugate priors
- Noninformative priors
- Discrete (Probability Masses)
- Information Theory
- Cross Entropy
- Mutual Information
- Limit Theorems
- Weak Law of Large Numbers
- Strong Law of Large Numbers
- Central Limit Theorem
- Bayesian Statistical Inference
- Bayesian inference and the posterior distribution
- Point Estimation
- Hypothesis Testing
- Maximum a-Posteriori Rule
- Classical Statistical Inference
- Binary Hypothesis Testing
- Significance Testing
- Moment Generating Functions
Source: Original Google Doc