The logarithmically averaged and non-logarithmically averaged Chowla conjectures

Let {\lambda: {\bf N} \rightarrow \{-1,1\}} be the Liouville function, thus {\lambda(n)} is defined to equal {+1} when {n} is the product of an even number of primes, and {-1} when {n} is the product of an odd number of primes. The Chowla conjecture asserts that {\lambda} has the statistics of a random sign pattern, in the sense that

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (1)

for all {k \geq 1} and all distinct natural numbers {h_1,\dots,h_k}, where we use the averaging notation

\displaystyle  \mathbb{E}_{n \leq N} f(n) := \frac{1}{N} \sum_{n \leq N} f(n).

For {k=1}, this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any {k \geq 2}.

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (2)

of the conjecture, where we use the logarithmic averaging notation

\displaystyle  \mathbb{E}_{n \leq N}^{\log} f(n) := \frac{\sum_{n \leq N} \frac{f(n)}{n}}{\sum_{n \leq N} \frac{1}{n}}.

Using the summation by parts (or telescoping series) identity

\displaystyle  \sum_{n \leq N} \frac{f(n)}{n} = \sum_{M < N} \frac{1}{M(M+1)} (\sum_{n \leq M} f(n)) + \frac{1}{N} \sum_{n \leq N} f(n) \ \ \ \ \ (3)

it is not difficult to show that the Chowla conjecture (1) for a given {k,h_1,\dots,h_k} implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for {k=1}, we have already mentioned that the Chowla conjecture

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n) = 0

is equivalent to the prime number theorem; but the logarithmically averaged analogue

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}^{\log}_{n \leq N} \lambda(n) = 0

is significantly easier to show (a proof with the Liouville function {\lambda} replaced by the closely related Möbius function {\mu} is given in <a href="

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n) = 0

“>this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for {k=2}, and in this recent paper with Joni Teravainen, we proved the conjecture for all odd {k} (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1 Assume that the logarithmically averaged Chowla conjecture (2) is true for all {k}. Then there exists a sequence {N_i} going to infinity such that the Chowla conjecture (1) is true for all {k} along that sequence, that is to say

\displaystyle  \lim_{N_i \rightarrow \infty} \mathbb{E}_{n \leq N_i} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for all {k} and all distinct {h_1,\dots,h_k}.


This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2 Let {k} be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for {2k}. Then there exists a set {{\mathcal N}} of natural numbers of logarithmic density {1} (that is, {\lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}} = 1}) such that

\displaystyle  \lim_{N \rightarrow \infty: N \in {\mathcal N}} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for any distinct {h_1,\dots,h_k}.


It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture ({k=2} and odd {k}) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct {h_1,\dots,h_k}, we take a large number {H} and consider the limiting the second moment

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2.

We can expand this as

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{m,m&apos; \leq H} \mathop{\bf E}_{n \leq N}^{\log} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)

\displaystyle  \lambda(n+m&apos;+h_1) \dots \lambda(n+m&apos;+h_k).

If all the {m+h_1,\dots,m+h_k,m&apos;+h_1,\dots,m&apos;+h_k} are distinct, the hypothesis (2) tells us that the inner averages goes to zero as {N \rightarrow \infty}. The remaining averages are {O(1)}, and there are {O( k^2 )} of these averages. We conclude that

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k^2 / H.

By Markov’s inequality (and (3)), we conclude that for any fixed {h_1,\dots,h_k, H}, there exists a set {{\mathcal N}_{h_1,\dots,h_k,H}} of upper logarithmic density at least {1-k/H^{1/2}}, thus

\displaystyle  \limsup_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}_{h_1,\dots,h_k,H}} \geq 1 - k/H^{1/2}

such that

\displaystyle  \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.

By deleting at most finitely many elements, we may assume that {{\mathcal N}_{h_1,\dots,h_k,H}} consists only of elements of size at least {H^2} (say).

For any {H_0}, if we let {{\mathcal N}_{h_1,\dots,h_k, \geq H_0}} be the union of {{\mathcal N}_{h_1,\dots,h_k, H}} for {H \geq H_0}, then {{\mathcal N}_{h_1,\dots,h_k, \geq H_0}} has logarithmic density {1}. By a diagonalisation argument (using the fact that the set of tuples {(h_1,\dots,h_k)} is countable), we can then find a set {{\mathcal N}} of natural numbers of logarithmic density {1}, such that for every {h_1,\dots,h_k,H_0}, every sufficiently large element of {{\mathcal N}} lies in {{\mathcal N}_{h_1,\dots,h_k,\geq H_0}}. Thus for every sufficiently large {N} in {{\mathcal N}}, one has

\displaystyle  \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.

for some {H \geq H_0} with {N \geq H^2}. By Cauchy-Schwarz, this implies that

\displaystyle  \mathop{\bf E}_{n \leq N} \mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k) \ll k^{1/2} / H^{1/4};

interchanging the sums and using {N \geq H^2} and {H \geq H_0}, this implies that

\displaystyle  \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) \ll k^{1/2} / H^{1/4} \leq k^{1/2} / H_0^{1/4}.

We conclude on taking {H_0} to infinity that

\displaystyle  \lim_{N \rightarrow \infty; N \in {\mathcal N}} \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

as required.

Filed under: expository, math.CO, math.DS, math.NT, math.PR Tagged: Chowla conjecture, second moment method

Heat flow and zeroes of polynomials

Let {P(z) = z^n + a_{n-1} z^{n-1} + \dots + a_0} be a monic polynomial of degree {n} with complex coefficients. Then by the fundamental theorem of algebra, we can factor {P} as

\displaystyle  P(z) = (z-z_1) \dots (z-z_n) \ \ \ \ \ (1)

for some complex zeroes {z_1,\dots,z_n} (possibly with repetition).

Now suppose we evolve {P} with respect to time by heat flow, creating a function {P(t,z)} of two variables with given initial data {P(0,z) = P(z)} for which

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z). \ \ \ \ \ (2)

On the space of polynomials of degree at most {n}, the operator {\partial_{zz}} is nilpotent, and one can solve this equation explicitly both forwards and backwards in time by the Taylor series

\displaystyle  P(t,z) = \sum_{j=0}^\infty \frac{t^j}{j!} \partial_z^{2j} P(0,z).

For instance, if one starts with a quadratic {P(0,z) = z^2 + bz + c}, then the polynomial evolves by the formula

\displaystyle  P(t,z) = z^2 + bz + (c+2t).

As the polynomial {P(t)} evolves in time, the zeroes {z_1(t),\dots,z_n(t)} evolve also. Assuming for sake of discussion that the zeroes are simple, the inverse function theorem tells us that the zeroes will (locally, at least) evolve smoothly in time. What are the dynamics of this evolution?

For instance, in the quadratic case, the quadratic formula tells us that the zeroes are

\displaystyle  z_1(t) = \frac{-b + \sqrt{b^2 - 4(c+2t)}}{2}


\displaystyle  z_2(t) = \frac{-b - \sqrt{b^2 - 4(c+2t)}}{2}

after arbitrarily choosing a branch of the square root. If {b,c} are real and the discriminant {b^2 - 4c} is initially positive, we see that we start with two real zeroes centred around {-b/2}, which then approach each other until time {t = \frac{b^2-4c}{8}}, at which point the roots collide and then move off from each other in an imaginary direction.

In the general case, we can obtain the equations of motion by implicitly differentiating the defining equation

\displaystyle  P( t, z_i(t) ) = 0

in time using (2) to obtain

\displaystyle  \partial_{zz} P( t, z_i(t) ) + \partial_t z_i(t) \partial_z P(t,z_i(t)) = 0.

To simplify notation we drop the explicit dependence on time, thus

\displaystyle  \partial_{zz} P(z_i) + (\partial_t z_i) \partial_z P(z_i)= 0.

From (1) and the product rule, we see that

\displaystyle  \partial_z P( z_i ) = \prod_{j:j \neq i} (z_i - z_j)


\displaystyle  \partial_{zz} P( z_i ) = 2 \sum_{k:k \neq i} \prod_{j:j \neq i,k} (z_i - z_j)

(where all indices are understood to range over {1,\dots,n}) leading to the equations of motion

\displaystyle  \partial_t z_i = \sum_{k:k \neq i} \frac{2}{z_k - z_i}, \ \ \ \ \ (3)

at least when one avoids those times in which there is a repeated zero. In the case when the zeroes {z_i} are real, each term {\frac{2}{z_k-z_i}} represents a (first-order) attraction in the dynamics between {z_i} and {z_k}, but the dynamics are more complicated for complex zeroes (e.g. purely imaginary zeroes will experience repulsion rather than attraction, as one already sees in the quadratic example). Curiously, this system resembles that of Dyson brownian motion (except with the brownian motion part removed, and time reversed). I learned of the connection between the ODE (3) and the heat equation from this paper of Csordas, Smith, and Varga, but perhaps it has been mentioned in earlier literature as well.

One interesting consequence of these equations is that if the zeroes are real at some time, then they will stay real as long as the zeroes do not collide. Let us now restrict attention to the case of real simple zeroes, in which case we will rename the zeroes as {x_i} instead of {z_i}, and order them as {x_1 < \dots < x_n}. The evolution

\displaystyle  \partial_t x_i = \sum_{k:k \neq i} \frac{2}{x_k - x_i}

can now be thought of as reverse gradient flow for the “entropy”

\displaystyle  H := -\sum_{i,j: i \neq j} \log |x_i - x_j|,

(which is also essentially the logarithm of the discriminant of the polynomial) since we have

\displaystyle  \partial_t x_i = \frac{\partial H}{\partial x_i}.

In particular, we have the monotonicity formula

\displaystyle  \partial_t H = 4E

where {E} is the “energy”

\displaystyle  E := \frac{1}{4} \sum_i (\frac{\partial H}{\partial x_i})^2

\displaystyle  = \sum_i (\sum_{k:k \neq i} \frac{1}{x_k-x_i})^2

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2} + 2 \sum_{i,j,k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_j-x_i)}

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2}

where in the last line we use the antisymmetrisation identity

\displaystyle  \frac{1}{(x_k-x_i)(x_j-x_i)} + \frac{1}{(x_i-x_j)(x_k-x_j)} + \frac{1}{(x_j-x_k)(x_i-x_k)} = 0.

Among other things, this shows that as one goes backwards in time, the entropy decreases, and so no collisions can occur to the past, only in the future, which is of course consistent with the attractive nature of the dynamics. As {H} is a convex function of the positions {x_1,\dots,x_n}, one expects {H} to also evolve in a convex manner in time, that is to say the energy {E} should be increasing. This is indeed the case:

Exercise 1 Show that

\displaystyle  \partial_t E = 2 \sum_{i,j: i \neq j} (\frac{2}{(x_i-x_j)^2} - \sum_{k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_k-x_j)})^2.


Symmetric polynomials of the zeroes are polynomial functions of the coefficients and should thus evolve in a polynomial fashion. One can compute this explicitly in simple cases. For instance, the center of mass is an invariant:

\displaystyle  \partial_t \frac{1}{n} \sum_i x_i = 0.

The variance decreases linearly:

Exercise 2 Establish the virial identity

\displaystyle  \partial_t \sum_{i,j} (x_i-x_j)^2 = - 4n^2(n-1).


As the variance (which is proportional to {\sum_{i,j} (x_i-x_j)^2}) cannot become negative, this identity shows that “finite time blowup” must occur – that the zeroes must collide at or before the time {\frac{1}{4n^2(n-1)} \sum_{i,j} (x_i-x_j)^2}.

Exercise 3 Show that the Stieltjes transform

\displaystyle  s(t,z) = \sum_i \frac{1}{x_i - z}

solves the viscous Burgers equation

\displaystyle  \partial_t s = \partial_{zz} s - 2 s \partial_z s,

either by using the original heat equation (2) and the identity {s = - \partial_z P / P}, or else by using the equations of motion (3). This relation between the Burgers equation and the heat equation is known as the Cole-Hopf transformation.


The paper of Csordas, Smith, and Varga mentioned previously gives some other bounds on the lifespan of the dynamics; roughly speaking, they show that if there is one pair of zeroes that are much closer to each other than to the other zeroes then they must collide in a short amount of time (unless there is a collision occuring even earlier at some other location). Their argument extends also to situations where there are an infinite number of zeroes, which they apply to get new results on Newman’s conjecture in analytic number theory. I would be curious to know of further places in the literature where this dynamics has been studied.

Filed under: expository, math.AP, math.CV, math.DS Tagged: entropy, heat flow, polynomials, zeroes