17-08-28 Interesting Facts in Machine Learning (Logistic Regression)
Category: Idea Lists (Upon Request)
<!-- gdoc-inlined -->
1. Two major ways to do multinomial eval:
- Softmax Loss
- One vs. All with binary (logistic) function
- Naming -
- “Logistic” regression due to Sigmoid (logistic) function
- “Softmax” regression due to softmax function
- No closed form solution, despite convexity
- Many, many optimizers:
- Newton / Newton-CG
- BFGS
- L-BFGS
- IRLS
- Trust Region Conjugate Gradient
- Gradient Descent
- GD + Line Search
- Stochastic Average Gradient
- Difficult Bayesian Solutions (No convenient conjugate prior)
- Discriminative (Learns P(Y|X), rather than first the joint P(Y, X) and then conditioning on X (the generative approach))
- Without regularization, the weights will become arbitrary large, damaging generalization. Penalties are more important than in the regression setting.
- You can get better generalization with a stochastic solver [https://arxiv.org/pdf/1708.05070.pdf]
- The reason scaling can still be important is for the optimizer - even though you technically have a convex model and will get the same solution
- Linear generalization is stronger than almost every other form of generalization for unstructured data (trees + networks overfit)
- Every relationship between your feature and the label should be as close to linear as possible
- You can use boxcox transform to automatically get close to linear
Source: Original Google Doc