Supervise learning! Some Notational Conventions n: number of training examples m: number of features x: input feature(s) y: output*/*target feature \theta: parameters h_{\theta}\left(x\right): the predictor function And so, a tuple \left(x,y\right) is a particular training example. We will use the parentheses notation to denote samples, so \left(x^{(i)}, y^{(i)}\right) as the ith example of training. We typically use h\left(x\right) as the predictor, parameters are \theta_{j}. New Concepts Linear Regression least-squares error gradient descent gradient descent for least-squares error variants summing over dataset: batch gradient descent pick one sample and run it: stochastic gradient descent pick some samples and run them: mini-batch gradient descenmini-bach gradient descet a primer on Vector Calculus trace Normal Equation