A first look at convex analysis
February 7th, 2024
This content is available as a PDF here.
Contents
Despite the fact that many problems encountered in statistics and machine learning are not convex, convexity is an important to study in order to understand what is the generic behaviour of an optimization procedure.
1 Convex set
A set
In most of these lecture notes, the set
The following proposition reviews some important algebraic properties of convex sets.
-
-
• Arbitrary intersection. Let
convex sets where is a set. Then is convex. -
• Cartesian product. Let
be sets of . Then is convex if, and only if, every is convex. -
• Sum. Let
, two convex sets. Then the (Minkowski) sum is convex. -
• Affine map. Let
be an affine map and a convex set. Then the image is convex.
-
2 Convex functions
The intuitive definition of a convex function is a function such that “the line joining
Figure 3: (Left) Epigraph of a concave function. (Right) Epigraph of a convex function.
Equipped with this notion, we can define convex functions using the definition of convex sets applied to the epigraph.
It is possible to make explicit this definition as follow:
Following the convention introduced in (Hiriart-Urruty, Jean-Baptiste and Lemarechal, Claude, 1996), we will denote the set of all convex functions of
Local and global minima of a convex function
For a (un)constrained minimization problem on a convex set
we say that
-
• a local solution of (2), if there exists a neighborhood
of such that for all , ; -
• a (global) solution if for all
, .
A fundamental result is that, when
-
Proof. Part 1. By contradiction. Consider a local solution
and assume that is not a global solution. In particular, there exists such that . By convexity,Given any open neighborhood
of , there exists such that (contradiction).Part 2. Take two global solutions, and study the segment joining it: let
and two solutions. We have , and in turn,where the first inequality comes from the equality of the function at
and , and the second by convexity. □
Differentiable properties of a convex function
A differentiable function is convex if, and only if, it is above all its tangents, see Figure 4.
A similar criterion can be derived for strictly convex function. Remark that Theorem 1 is a direct consequence of Proposition prop-convex-diff for the class of differentiable convex functions: a local minima
3 Strongly convex functions
A strongly convex function satisfies a stronger statement of the Jensen’s inequality.
Another way to define strongly convex function is to say that
is a convex function.
A strongly convex function defined on the whole space
Differentiability, strong convexity and quadratic lower bound
A function
Note that if
An easy consequence is that it is possible to establish a lower bound on the values based on a lower bound on the distance between points:
where
4 Lipschitz continuity of the gradient
Quadratic upper bound
-
Proposition 4. Let
a -smooth function. Then, for all ,Moreover, if
is convex, then (5) is equivalent to
-
Proof. The fact that
-smoothness implies (5) is a direct application of the (generalized) Cauchy-Schwarz inequality. For the equivalence of (5) and (6), let’s prove the two direction.Given
, define the real function . Assuming (5) holds, thenNow,
Reciprocally, if (5) holds, it is sufficient to apply it twice (to
and ) and combine the two equations. □
This result has a strong implication on the minima of
-
Proof. The two side are proved separately.
rhs: Apply (5) applied to
.lhs: Minimize the quadratic upper bound for
: where the last inequality is obtained by the change of variable , and . Factorizing the trinom in leads to:Hence,
Minimizing the infimum boils down to maximizing
, that is obtained with , hence the result. □
Co-coercivity of the gradient of a -smooth function
Convex
-
Proof. Let
and . The two are convex.Since
is -smooth, and are also -smooth.Canceling the gradient of
shows that minimizes . Hence, using the quadratic upper bound, we haveDoing exactly the same thing for
leads a similar bound, and combining the two gives the result. □
It turns out that smoothness, co-coercivity and upper quadratic boundness are equivalent properties for full-domain convex function.
-
Proposition 7. Let
be a differentiable convex function with . Then the three following properties are equivalent:-
1.
is -smooth. -
2. The quadratic upper bound (6) holds.
-
3.
is -co-coercive.
-
For the special case of the
When
-
Proof. Consider the convex function
. Its gradient reads . Using the -smoothness of , we have that Hence, is -smooth.If
, then nothing is left to prove since is -strongly convex. Otherwise, the -co-coercivity of (Proposition 6) gives usExplicing the expression of
allows us to conclude. □
Baillon, Jean-Bernard and Haddad, Georges (1977). Quelques propriétés des opérateurs angle-bornés etn-cycliquement monotones.
Hiriart-Urruty, Jean-Baptiste and Lemarechal, Claude (1996). Convex Analysis and Minimization Algorithms I: Fundamentals, Springer.