Why convex optimization




















Boyd" -- youtu. Show 4 more comments. Active Oldest Votes. Improve this answer. Really addresses how critical optimization is to ML, and how ML simplifies by using convex approximations that work with gradient descent.

Show 3 more comments. Why study optimization Why convex optimization I think Tim has a good answer on why optimization. But is the world convex? Why obsessed with convexity? Check this metaphor A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. Haitao Du Haitao Du Searching for the keys in the dark is hard to impossible, so you adapt the problem into one you know how to solve.

If you work on a problem with non-convex algorithms and come up with a solution that will cost 3 million dollars, and I work a similar problem with convex optimization and take my answer and find a solution to the non-convex problem that costs 2 million dollars, I've found a better answer. Comparing convex analysis to the streetlight effect is just wrong. I would advise you to refer to the introductory textbook Convex Optimization by Boyd and Vandenberghe in order to learn more on the topic.

Add a comment. Federico Poloni Federico Poloni 2 2 silver badges 12 12 bronze badges. Toby Toby 1 1 silver badge 7 7 bronze badges. But you are wrong by saying that both things are "fundamentally different".

ML algorithms use optimization to minimize loss functions and find the optimal parameters given the data and objective. When you are tuning your hyperparameters, you are looking for optimal combination of them. In each of the cases you are maximizing or minimizing something to achieve your goal, so you are using some kind of optimization. A significant number of machine learning problems boil down to optimization problems. Especially in the context of convex optimization which is what OP is asking about , the optimal solution can be easily found for example gradient descent with decaying learning rate is guaranteed to converge to the optimum of a convex function.

The big problem is that many problems in the machine learning are non-convex. From point of view of Statistical Learning the three main questions for regression and classification are: What is function family from which you pull approximator What is a criteria how you pull a function What is a method to find the best function To operate in some constructive way on 1 - it's not so obvious how use math optimizaion can help To operate in some constructive way on 2 - it's obvious that objective is the goal.

To operate in some constructive way on 3 - you need math optimization. Non-differentiablity is not a problem. And there are 50 generalization of convex functions from which more two usefull in terms of application is quasiconvex and log-concave.

A non-convex function "curves up and down" -- it is neither convex nor concave. A familiar example is the sine function:. If the bounds on the variables restrict the domain of the objective and constraints to a region where the functions are convex, then the overall problem is convex. Because of their desirable properties, convex optimization problems can be solved with a variety of methods. But Interior Point or Barrier methods are especially appropriate for convex problems, because they treat linear, quadratic, conic, and smooth nonlinear functions in essentially the same way -- they create and use a smooth convex nonlinear barrier function for the constraints, even for LP problems.

These methods make it practical to solve convex problems up to very large size, and they are especially effective on second order quadratic and SOCP problems , where the Hessians of the problem functions are constant.

Both theoretical results and practical experience show that Interior Point methods require a relatively small number of iterations typically less than 50 to reach an optimal solution, independent of the number of variables and constraints though the computational effort per iteration rises with the number of variables and constraints.

Interior Point methods have also benefited, more than other methods, from hardware advances -- instruction caching, pipelining, and other changes in processor architecture. All Frontline Systems Solvers are effective on convex problems with the appropriate types of problem functions linear, quadratic, conic, or nonlinear.

The objective function is subjected to equality constraints and inequality constraints. Inequality constraints indicate that the solution should lie in some range whereas equality constraint requires it to lie exactly at a given point.

Convex optimization can be used to also optimize an algorithm which will increase the speed at which the algorithm converges to the solution. It can also be used to solve linear systems of equations rather than compute an exact answer to the system.

To solve convex optimization problems, machine learning techniques such as gradient descent are used. Convexity plays an important role in convex optimizations.

It ensures that convex optimization problems are smooth and have well-defined derivatives to enable the use of gradient descent. Some examples of convex functions are linear, quadratic, absolute value, logistic, exponential functions among others. For convexity, convex sets are the most important.

A convex set is a set that contains all points on or inside its boundary and contains all convex combinations of points in its interior. A convex set is defined as a set of all convex functions. For a convex problem you could simply stop, knowing that you were already at a local and thus global minimum point.

Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Why are convex problems easy to optimize? Ask Question. Asked 5 years, 1 month ago. Active 5 years, 1 month ago. Viewed 2k times.

Improve this question. Community Bot 1. Add a comment. Active Oldest Votes. If the Hessian is indefinite, then either The local quadratic approximation is a good local approximation to the objective function and is therefore a saddle surface.

Then using this quadratic approximation would suggest moving towards a saddle point, which is likely to be in the wrong direction, or The local quadratic approximation is forced to have a minimum by construction, in which case it is likely to be a poor approximation to the original objective function.



0コメント

  • 1000 / 1000