Despite the fact that many problems encountered in statistics and machine learning are not convex, convexity is an important to study in order to understand what is the generic behaviour of an optimization procedure.
1 Convex set
A set is convex if every segment of is contained in . It can formalized as the following definition and illustrated in Figure 2.
Definition 1.
A set is convex if
In most of these lecture notes, the set will the Euclidean space , but remark that this notion only requires that is a vector space over the real numbers , possibly of infinite dimension.
Example 1.
-
•
The convex sets of are exactly the intervals of .
-
•
Any affine hyperplane where is a real vector space, a linear functional and is convex.
-
•
Any half-space is also convex.
-
•
The unit simplex of defined as
is convex.
-
•
The nonnegative orthant is convex.
The following proposition reviews some important algebraic properties of convex sets.
Proposition 1.
-
•
Arbitrary intersection. Let convex sets where is a set. Then is convex.
-
•
Cartesian product. Let be sets of . Then is convex if, and only if, every is convex.
-
•
Sum. Let , two convex sets. Then the (Minkowski) sum is convex.
-
•
Affine map. Let be an affine map and a convex set. Then the image is convex.
2 Convex functions
The intuitive definition of a convex function is a function such that “the line joining and lies above the graph of between and ”. Formally, this definition makes sense using the definition of the epigraph of .
Definition 2.
Let and . The epigraph of is the subset of defined by
Equipped with this notion, we can define convex functions using the definition of convex sets applied to the epigraph.
Definition 3.
A function such that is convex function iff its epigraph is convex.
It is possible to make explicit this definition as follow:
| (1) |
Following the convention introduced in (Hiriart-Urruty, Jean-Baptiste and Lemarechal, Claude, 1996), we will denote the set of all convex functions of as , and the set of closed – that is, with closed epigraph – convex functions of as .
2.1 Local and global minima of a convex function
For a (un)constrained minimization problem on a convex set with a function ,
| (2) |
we say that is
-
•
a local solution of (2), if there exists a neighborhood of such that for all , ;
-
•
a (global) solution if for all , .
A fundamental result is that, when is convex, every local solution of (2) is global solution, and the set of (global) solutions is a convex set.
Theorem 1.
Assume that is a convex function and is a convex closed set. Then,
Proof.
Part 1. By contradiction. Consider a local solution and assume that is not a global solution. In particular, there exists such that . By convexity,
Given any open neighborhood of , there exists such that (contradiction).
Part 2. Take two global solutions, and study the segment joining it: let and two solutions. We have , and in turn,
where the first inequality comes from the equality of the function at and , and the second by convexity. ∎
2.2 Differentiable properties of a convex function
A differentiable function is convex if, and only if, it is above all its tangents, see Figure 4.
Proposition 2.
Let be a differentiable function on an open set , and a convex subset of . Then is convex on if, and only if,
| (3) |
Proof.
Assume is convex on . Let and . By definition of convexity, we have
Dividing by , and letting goes to 0, we obtain (3).
A similar criterion can be derived for strictly convex function. Remark that Theorem 1 is a direct consequence of Proposition prop-convex-diff for the class of differentiable convex functions: a local minima satisfies and thus (3) gives for all .
3 Strongly convex functions
A strongly convex function satisfies a stronger statement of the Jensen’s inequality.
Definition 4.
A function is strongly convex of modulus if is convex and
for all and .
Another way to define strongly convex function is to say that is -strongly convex if the function
is a convex function.
A strongly convex function defined on the whole space has a unique minimizer.
3.1 Differentiability, strong convexity and quadratic lower bound
A function is strongly convex if, and only if, it is strictly above its tangent, and you can “fit” a quadratic form between the tangent and the function itself. This is a key property to defined lower bounds on .
Proposition 3.
Let be a differentiable and -strongly convex function. Then,
Note that if is , is -strongly convex if, and only if, its Hessian , i.e., is definite positive.
An easy consequence is that it is possible to establish a lower bound on the values based on a lower bound on the distance between points:
| (4) |
where is the minimizer of .
4 Lipschitz continuity of the gradient
Definition 5.
A differentiable function is said to be -smooth if is -Lipschitz, i.e.,
Here is the dual norm of .
4.1 Quadratic upper bound
-smoothness is important in optimization because it ensures a quadratic upper bound of the function.
Proposition 4.
Proof.
The fact that -smoothness implies (5) is a direct application of the (generalized) Cauchy-Schwarz inequality. For the equivalence of (5) and (6), let’s prove the two direction.
Given , define the real function . Assuming (5) holds, then
Now,
Reciprocally, if (5) holds, it is sufficient to apply it twice (to and ) and combine the two equations. ∎
This result has a strong implication on the minima of .
Proposition 5.
Let be a -smooth function with full domain, and assume is a minimizer of . Then,
Proof.
The two side are proved separately.
rhs: Apply (5) applied to .
lhs: Minimize the quadratic upper bound for :
where the last inequality is obtained by the change of variable , and . Factorizing the trinom in leads to:
Hence,
Minimizing the infimum boils down to maximizing , that is obtained with , hence the result. ∎
4.2 Co-coercivity of the gradient of a -smooth function
Convex -smooth functions are co-coercive. This result is sometimes coined Baillon–Haddad theorem (Baillon, Jean-Bernard and Haddad, Georges, 1977), but one shall note that the original contributionis much more general than its application to the gradient.
Proposition 6.
Let be a full-domain convex -smooth function. Then, is -co-coercive, i.e.,
Proof.
Let and . The two are convex.
Since is -smooth, and are also -smooth.
Canceling the gradient of shows that minimizes . Hence, using the quadratic upper bound, we have
Doing exactly the same thing for leads a similar bound, and combining the two gives the result. ∎
It turns out that smoothness, co-coercivity and upper quadratic boundness are equivalent properties for full-domain convex function.
Proposition 7.
Let be a differentiable convex function with . Then the three following properties are equivalent:
-
1.
is -smooth.
-
2.
The quadratic upper bound (6) holds.
-
3.
is -co-coercive.
Proof.
The only one to prove is 2. 3. (by Cauchy-Schwarz). ∎
For the special case of the norm, we have:
Proposition 8.
Let be a differentiable convex function with , and -smooth with respect to the Euclidean norm. Then,
-
1.
is monotone, i.e.
-
1.
is convex.
-
2.
If is twice differentiable, then,
When is both -smooth and -strongly convex, we have the following “strenghened” co-coercivity.
Proposition 9.
Let be a -smooth and -strongly convex function. For any , we have
Proof.
Consider the convex function . Its gradient reads . Using the -smoothness of , we have that
Hence, is -smooth.
If , then nothing is left to prove since is -strongly convex. Otherwise, the -co-coercivity of (Proposition 6) gives us
Explicing the expression of allows us to conclude. ∎
Baillon, Jean-Bernard and Haddad, Georges (1977). Quelques propriétés des opérateurs angle-bornés etn-cycliquement monotones.
Hiriart-Urruty, Jean-Baptiste and Lemarechal, Claude (1996). Convex Analysis and Minimization Algorithms I: Fundamentals, Springer.