$\newcommand{\ones}{\mathbf 1}$
Algorithms for unconstrained optimization
In descent methods, the particular choice of search direction does not matter so much.
True.
Incorrect.
False.
Correct!
In descent methods, the particular choice of line search does not matter so much.
True.
Correct!
False.
Incorrect.
When the gradient descent method is started from a point near the solution, it will converge very quickly.
True.
Incorrect.
False.
Correct!
Newton's method with step size $h=1$ always works.
True.
Incorrect.
False.
Correct!
When Newton's method is started from a point near the solution, it will converge very quickly.
True.
Correct!
False.
Incorrect.
Using Newton's method to minimize $f(Ty)$, where $Ty=x$ and $T$ is nonsingular, can greatly improve the convergence speed when $T$ is chosen appropriately.
True.
Incorrect.
False.
Correct!
If $f$ is self-concordant, its Hessian is Lipschitz continuous.
True.
Incorrect.
False.
Correct!
If the Hessian of $f$ is Lipschitz continuous, then $f$ is self-concordant.
True.
Incorrect.
False.
Correct!
Newton's method should only be used to minimize self-concordant functions.
True.
Incorrect.
False.
Correct!
$f(x) = \exp x$ is self-concordant.
True.
Incorrect.
False.
Correct!
$f(x) = -\log x$ is self-concordant.
True.
Correct!
False.
Incorrect.
Consider the problem of minimizing \[ f(x) = (c^Tx)^4 + \sum_{i=1}^n w_i \exp x_i, \] over $x \in \mathbf{R}^n$, where $w \succ 0$.
Newton's method would probably require fewer iterations than the gradient method, but each iteration would be much more costly.
True.
Incorrect.
False.
Correct!
Newton's method is seldom used in machine learning because
common loss functions are not self-concordant
Incorrect.
While this is true, it is not the reason Newton's method isn't used.
Newton's method does not work well on noisy data
Incorrect.
This statement doesn't even make sense.
machine learning researchers don't really understand linear algebra
Incorrect.
It is known that at least some machine learning researchers do know linear algebra.
it is generally not practical to form or store the Hessian in such problems, due to large problem size
Correct!