A first look at convex analysis

February 7th, 2024

This content is available as a PDF here.

Contents

(image)

Figure 1: Definition of concavity by Archimede

Despite the fact that many problems encountered in statistics and machine learning are not convex, convexity is an important to study in order to understand what is the generic behaviour of an optimization procedure.

1 Convex set

A set CX is convex if every segment of C is contained in C. It can formalized as the following definition and illustrated in Figure 2.

(image)

Figure 2: Left: convex set. Right: not a convex set.

  • Definition 1. A set CX is convex if

    (x,y)C2,λ[0,1],λx+(1λ)yC.

In most of these lecture notes, the set X will the Euclidean space Rp, but remark that this notion only requires that X is a vector space over the real numbers R, possibly of infinite dimension.

  • Example 1.

    • The convex sets of R are exactly the intervals of R.

    • Any affine hyperplane H={xXφ(x)=α} where X is a real vector space, φ a linear functional and αR is convex.

    • Any half-space H={xX|φ(x)α} is also convex.

    • The unit simplex of Rd defined as

    Δd1={xRdxi0 and ixi=1}

    is convex.

    • The nonnegative orthant R0d={xRd|xi0} is convex.

The following proposition reviews some important algebraic properties of convex sets.

  • Proposition 1.

    • Arbitrary intersection. Let (Ci)iI convex sets where I is a set. Then iICi is convex.

    • Cartesian product. Let C1,,Cn be sets of Rn1,,Rnn. Then C1××Cn is convex if, and only if, every Ci is convex.

    • Sum. Let C1, C2 two convex sets. Then the (Minkowski) sum C1+C2 is convex.

    • Affine map. Let A:RnRm be an affine map and CRn a convex set. Then the image A(C) is convex.

2 Convex functions

The intuitive definition of a convex function is a function such that “the line joining f(x) and f(y) lies above the graph of f between x and y”. Formally, this definition makes sense using the definition of the epigraph of f.

  • Definition 2. Let f:RnR and f+. The epigraph of f is the subset of Rn×R defined by

    epif:={(x,t)Rn×Rtf(x)}.

(image)

Figure 3: (Left) Epigraph of a concave function. (Right) Epigraph of a convex function.

Equipped with this notion, we can define convex functions using the definition of convex sets applied to the epigraph.

  • Definition 3. A function f:RpR such that f+ is convex function iff its epigraph epif is convex.

It is possible to make explicit this definition as follow:

(1)x,ydomf,λ[0,1],f(λx+(1λ)y)λf(x)+(1λ)f(y).

Following the convention introduced in (Hiriart-Urruty, Jean-Baptiste and Lemarechal, Claude, 1996), we will denote the set of all convex functions of Rp as ConvRp, and the set of closed – that is, with closed epigraph – convex functions of Rp as ConvRp.

Local and global minima of a convex function

For a (un)constrained minimization problem on a convex set Cdomf with a function f:CR,

(2)minxCf(x),

we say that x is

  • a local solution of (2), if there exists a neighborhood O of x such that for all xCO, f(x)f(x);

  • a (global) solution if for all xC, f(x)f(x).

A fundamental result is that, when f is convex, every local solution of (2) is global solution, and the set of (global) solutions is a convex set.

  • Theorem 1. Assume that f is a convex function and C is a convex closed set. Then,

    • Any local solution of (2) is a global solution;

    • The set of global solutions of (2) is convex.

  • Proof. Part 1. By contradiction. Consider a local solution x and assume that x is not a global solution. In particular, there exists zC such that f(z)<f(x). By convexity,

    f((1t)x+tz)(1t)f(x)+tf(z)<f(x)t[0,1]

    Given any open neighborhood O of x, there exists t>0 such that (1t)f(x)+tf(z)<f(x) (contradiction).

    Part 2. Take two global solutions, and study the segment joining it: let x and z two solutions. We have f(x)=f(z), and in turn,

    f(x)f((1t)x+tz)(1t)f(x)+tf(z)=f(x),

    where the first inequality comes from the equality of the function at x and z, and the second by convexity.

Differentiable properties of a convex function

A differentiable function is convex if, and only if, it is above all its tangents, see Figure 4.

(image)

Figure 4: A differentiable convex function and its tangent. The curve is always above its tangent in red.

  • Proposition 2. Let f be a differentiable function on an open set ΩRp, and CΩ a convex subset of Ω. Then f is convex on C if, and only if,

    (3)x,x¯C,f(x)f(x¯)+f(x¯),xx¯.

  • Proof. Assume f is convex on C. Let x,x¯C and λ(0,1). By definition of convexity, we have

    f(λx+(1λ)x¯)f(x¯)λ(f(x)f(x¯)).

    Dividing by λ, and letting λ goes to 0, we obtain (3).

    Reciprocally, let x,yC, λ(0,1) and define z=αx+(1α)yC. Applying (3) twice, we have

    f(x)f(z)+f(z),xzf(y)f(z)+f(z),yz. Using a convex combination of these two lines, we get

    αf(x)+(1α)f(y)f(z)+f(x),αx+(1α)yz,

    which is exactly (1).

A similar criterion can be derived for strictly convex function. Remark that Theorem 1 is a direct consequence of Proposition prop-convex-diff for the class of differentiable convex functions: a local minima x satisfies f(x)=0 and thus (3) gives f(x)f(x) for all xC.

3 Strongly convex functions

A strongly convex function satisfies a stronger statement of the Jensen’s inequality.

  • Definition 4. A function f:RnR is strongly convex of modulus μ>0 if domf is convex and

    f(λx+(1λ)y)λf(x)+(1λ)f(y)12μλ(1λ)xy2,

    for all x,ydomf and λ[0,1].

Another way to define strongly convex function is to say that f is μ-strongly convex if the function

xf(x)μ2x2

is a convex function.

A strongly convex function defined on the whole space Rd has a unique minimizer.

Differentiability, strong convexity and quadratic lower bound

A function f is strongly convex if, and only if, it is strictly above its tangent, and you can “fit” a quadratic form between the tangent and the function itself. This is a key property to defined lower bounds on f.

(image)

Figure 5: Strong convexity and differentiability. There is space for a quadratic form between the curve y=f(x) and the tangent.

  • Proposition 3. Let f be a differentiable and μ-strongly convex function. Then,

    x,x¯domf,f(x)f(x¯)+f(x¯),xx¯+μ2xx¯2.

Note that if f is C2, f is μ-strongly convex if, and only if, its Hessian 2fμId, i.e., is definite positive.

An easy consequence is that it is possible to establish a lower bound on the values based on a lower bound on the distance between points:

(4)f(x)f(x)+μ2xx2,

where x is the minimizer of f.

4 Lipschitz continuity of the gradient

  • Definition 5. A differentiable function f:RnR is said to be L-smooth if f is L-Lipschitz, i.e.,

    x,ydomf,f(x)f(y)Lxy.

    Here is the dual norm of .

Quadratic upper bound

L-smoothness is important in optimization because it ensures a quadratic upper bound of the function.

  • Proposition 4. Let f a L-smooth function. Then, for all x,ydomf,

    (5)f(x)f(y),xyLxy2.

    Moreover, if domf is convex, then (5) is equivalent to

    (6)f(y)f(x)+f(x),yx+L2xy2,x,ydomf.

  • Proof. The fact that L-smoothness implies (5) is a direct application of the (generalized) Cauchy-Schwarz inequality. For the equivalence of (5) and (6), let’s prove the two direction.

    Given x,ydomf, define the real function g(t)=f(x+t(yx)). Assuming (5) holds, then

    g(t)g(0)=f(x+t(yx))f(x),yxtLxy2.

    Now,

    f(y)=g(1)=g(0)+01g(t)g(0)+g(0)+L2xy2=f(x)+f(x),yx+L2xy2.

    Reciprocally, if (5) holds, it is sufficient to apply it twice (to x,y and y,x) and combine the two equations.

This result has a strong implication on the minima of f.

  • Proposition 5. Let f be a L-smooth function with full domain, and assume x is a minimizer of f. Then,

    12Lf(z)2f(z)f(x)L2zx2zRn.

  • Proof. The two side are proved separately.

    rhs: Apply (5) applied to (x,z).

    lhs: Minimize the quadratic upper bound for x=z:

    f(x)=minyRnf(y)infy(f(z)+f(z),yz+L2yz2)=infv=1inft(f(z)+tf(z),v+Lt22), where the last inequality is obtained by the change of variable v=ww, t=w and w=yz. Factorizing the trinom in t leads to:

    f(z)+tf(z),v+Lt22=L2(t+1Lf(z),v)212Lf(z),v2+f(z).

    Hence,

    minyRnf(y)infv=1(f(z)12Lf(z),v2).

    Minimizing the infimum boils down to maximizing f(z),v2, that is obtained with v=f(z), hence the result.

Co-coercivity of the gradient of a L-smooth function

Convex L-smooth functions are co-coercive. This result is sometimes coined Baillon–Haddad theorem (Baillon, Jean-Bernard and Haddad, Georges, 1977), but one shall note that the original contributionis much more general than its application to the gradient.

  • Proposition 6. Let f be a full-domain convex L-smooth function. Then, f is 1/L-co-coercive, i.e.,

    x,yRn,f(x)f(y),xy1Lf(x)f(y)2.

  • Proof. Let fx(z)=f(z)f(x),z and fy(z)=f(z)f(y),z. The two are convex.

    Since f is L-smooth, fx and fy are also L-smooth.

    Canceling the gradient of fx shows that z=x minimizes fx. Hence, using the quadratic upper bound, we have

    f(y)f(x)f(x),xy=fx(y)fx(x)12Lfx(y)2=12Lf(y)f(x)2.

    Doing exactly the same thing for fy leads a similar bound, and combining the two gives the result.

It turns out that smoothness, co-coercivity and upper quadratic boundness are equivalent properties for full-domain convex function.

  • Proposition 7. Let f be a differentiable convex function with domf=Rn. Then the three following properties are equivalent:

    • 1. f is L-smooth.

    • 2. The quadratic upper bound (6) holds.

    • 3. f is 1/L-co-coercive.

  • Proof. The only one to prove is 2. 3. (by Cauchy-Schwarz).

For the special case of the 2 norm, we have:

  • Proposition 8. Let f be a differentiable convex function with domf=Rn, and L-smooth with respect to the Euclidean norm. Then,

    • 1. LIdf is monotone, i.e.

    x,yRn,f(x)f(y),xyLxy22.

    • 1. L2x22f(x) is convex.

    • 2. If f is twice differentiable, then,

    λmax(2f(x))L,xRn.

When f is both L-smooth and μ-strongly convex, we have the following “strenghened” co-coercivity.

  • Proposition 9. Let f be a L-smooth and μ-strongly convex function. For any x,yRn, we have

    f(x)f(y),xyμLμ+Lxy2+1μ+Lf(x)f(y)2.

  • Proof. Consider the convex function ϕ(x)=f(x)μ2x2. Its gradient reads ϕ(x)=f(x)μx. Using the L-smoothness of f, we have that

    ϕ(x)ϕ(y),xy=f(x)f(y),xyμxy,xy(Lμ)xy2. Hence, ϕ is (Lμ)-smooth.

    If L=μ, then nothing is left to prove since f is μ-strongly convex. Otherwise, the (Lμ)-co-coercivity of ϕ (Proposition 6) gives us

    ϕ(x)ϕ(y),xy1Lμϕ(x)ϕ(y)2.

    Explicing the expression of ϕ(x) allows us to conclude.

Baillon, Jean-Bernard and Haddad, Georges (1977). Quelques propriétés des opérateurs angle-bornés etn-cycliquement monotones.

Hiriart-Urruty, Jean-Baptiste and Lemarechal, Claude (1996). Convex Analysis and Minimization Algorithms I: Fundamentals, Springer.