阅读视图

发现新文章,点击刷新页面。
🔲 ☆

Artin-Schreier Extensions

Recall

Throughout, let $K$ be a field of characteristic $p\ne 0$ and $E/K$ a cyclic extension of order $p^{m-1}$ with $m >1$. The algebraic closure $\overline K^\mathrm{a}$, the separable algebraic closure $\overline K^{\mathrm{s}}$ are always fixed. We use $\mathbf{F}_p$ to denote the finite field of $p$ elements.

For proposition 2 in the post, let $G$ be the Galois group of the extension of $\overline K^\mathrm{s}/K$ (which is, the projective limit of $\mathrm{Gal}(K’/K))$, with $K’$ running over all finite and separable extension of $K$; see this post for the definition of projective limit). The reader is expected to know how to induce a long exact sequence from a short exact sequence, for example from this post.

In this post (the reader is urged to make sure that he or she has understood the concept of characters and more importantly Hilbert’s theorem 90), we have shown that if $[E:K]=p$, then $E=K(x)$ where $x$ is the zero of a polynomial of the form $X^p-X-\alpha$ where $\alpha \in K$. In this belated post, we want to show that, whenever it comes to an extension of order $p^{m-1}$, we are running into the a polynomial of the form $X^p-X-\alpha$. The theory behind is called Artin-Schreier theory, which has its own (highly non-trivial) nature.

Artin-Schreier extensions

Definition 1. An Artin-Schreier polynomial $A_\alpha(X) \in K[X]$ is of the form

An immediate property of Artin-Schreier polynomials that one should notice is the equation

To see this, one should notice that for $x,y \in K$ we have $(x+y)^p=x^p+y^p$.

With this equation we can easily show that

Proposition 1. If $A_\alpha(X)$ has a root in $K$, then all roots of $A_\alpha(X)$ is in $K$. Otherwise, $A_\alpha(X)$ is irreducible over $K$. In this case, let $x$ be a root of $A_\alpha(X)$, then $K(x)/K$ is a cyclic extension of degree $p$.

Proof. We suppose that $x \in K$ is a root of $A_\alpha(X)$. Then

Therefore, by induction, we see easily that $x, x+1, \cdots, x+p-1$ are roots of $A_\alpha(X)$, all of which are in $K$.

Now we suppose that $A_\alpha(X)$ has no root in $K$. Let $x \in \overline K$ be a root of $A_\alpha(X)$. Then in $\overline K[X]$, the polynomial will be written in the form

because, again due to the equation $A_\alpha(X+Y)=A_\alpha(X)+A_\alpha(Y)-A_\alpha(0)$, we can see that $x,x+1,\dots,x+p-1$ are roots of $A_\alpha$.

By contradiction we suppose that $A_\alpha$ is reducible, say $A_\alpha(X)=f(X)g(X)$ where $1 \le d=\deg f < p$ and $f,g \in K[X]$. It follows that

where $\{n_1,\dots,n_d\} \subset \{1,2,\cdots,p\}$. If we expand the polynomial above, we see

Therefore $\left(\sum_{j=1}^{d}n_j-dx\right) \in K$ which is absurd because we then have $x \in K$. Therefore we see that $A_\alpha$ is irreducible.

To see that $K(x)/K$ is Galois, we first notice that this extension is normal : $K(x)$ contains all roots of $A_\alpha(X)$. This extension is separable because all roots of $A_\alpha(X)$, namely $x,x+1,\dots,x+p-1$, are pairwise distinct, i.e. $A_\alpha(X)$ has no multiple roots.

Finally, to see why the Galois group of $K(x)/K$ is cyclic, we notice the action of the Galois group $G$ over the roots of $A_\alpha(X)$. Since $A_\alpha(X)$ is irreducible, there exists $\sigma \in G$ such that $\sigma(x)=x+1$. We see easily that $\sigma^j(x)=x+j$ so $\sigma$ generates $G$ which has period $p$. $\square$

The correspondence between extensions of degree $p$ and polynomials of the form $X^p-X-\alpha$ inspires us to consider them in a distinguished manner.

Definition 2. The field extension $E/K$ is called an Artin-Schreier extension if $E=K(x)$ for some $\alpha \in L \setminus K$ such that $x^p-x\in K$.

Consider the map $\wp:\overline K^\mathrm{s} \to \overline K^\mathrm{s}$ defined by $u \mapsto u^p-u$. We certainly want to find the deep relation between Artin-Schreier extensions of a given field $K$ and the map $\wp$. One of the key information can be found through the following correspondence.

Proposition 2. There is an isomorphism $\operatorname{Hom}(G,\mathbf{F}_p) \cong K/\wp(K)$.

Proof. We first notice that $\wp$ is a $G$-homomorphism, that is, it commutes with the action of $G$ on $\overline K^\mathrm{s}$. Indeed, for any $x \in \overline K^\mathrm{s}$ and $g \in G$, we have

On the other hand, $\wp$ is surjective. Indeed, for any $a \in \overline{K}^\mathrm{s}$, the equation $X^p-X=a$ always has a solution in $\overline K^\mathrm{s}$ because the polynomial $X^p-X-a$ is separable.

We can also see that the kernel of $\wp$ is $\mathbf{F}_p$. This is because the splitting field of $X^p-X$ is the field of $p^1$ elements, which has to be $\mathbf{F}_p$ itself. Therefore we have obtained a short exact sequence

where $\iota$ is the embedding. Taking the long exact sequence of cohomology, noticing that, by Hilbert’s Theorem 90, $H^1(G,\overline{K}^\mathrm{s})=0$, we have another exact sequence

where the first arrow is induced by $\wp$ and the second by $\iota$. Therefore we have $\operatorname{Hom}(G,\mathbf{F}_p) \cong K/\wp(K)$. One can explicitly show that there is a surjective map $K \to \operatorname{Hom}(G,\mathbf{F}_q)$ with kernel $\wp(K)$ that defines the isomorphism. For $c \in K$, one solves $x^p-x=c$, then $\varphi_c:g\mapsto g(x)-x$ is the desired map. The key ingredient of the verification involves the (infinite) Galois correspondence, but otherwise the verification is very tedious. We remark that for any $\varphi \in \operatorname{Hom}(G,\mathbf{F}_p)\setminus\{0\}$ and put $H=\ker\varphi$. Then $K^H/K$ is an Artin-Schreier extension with Galois group $G/H$ and on the other hand $H=\mathrm{Gal}(\overline K^\mathrm{s}/K^H)$. $\square$

“Artin-Schreier of higher order”

We conclude this post by showing that, under a certain condition, one can find an Artin-Schreier extension $L/E$ such that $L/K$ is cyclic of order $p^m$.

Lemma 1. Let $\beta \in E$ be an element such that $\operatorname{Tr}_K^E(\beta)=1$, then there exists $\alpha \in K$ such that $\sigma(\alpha)-\alpha = \beta^p-\beta$, where $\sigma$ is the generator of $\operatorname{Gal}(E/K)$.

Proof. Notice that $\operatorname{Tr}_K^E(\beta^p)=\operatorname{Tr}_K^E(\beta)^p=1$, which implies that $\operatorname{Tr}_K^E(\beta^p-\beta)=0$. By Hilbert’s theorem 90, such $\alpha$ exists. $\square$

Lemma 2. The polynomial $f(X)=X^p-X-\alpha$ is irreducible over $E$; that is, let $\theta$ be a root of $f$, then $E(\theta)$ is an Artin-Schreier extension of $E$.

Proof. By contradiction, we suppose that $\theta \in E$. By Artin-Schreier, all roots of $f$ lie in $E$. In particular, $\sigma(\theta)$ is a root of $f$. Therefore

which implies that

It follows that $\sigma\theta-\theta-\beta$ is a root of $g(X)=X^p-X$. This implies that $\sigma\theta-\theta-\beta\in\mathbf{F}_p \subset K$ and therefore

However, by assumption and Artin-Schreier, $\sigma\theta-\theta \in \mathbf{F}_p \subset K$ we therefore have $\operatorname{Tr}_K^E(\sigma\theta-\theta)=0$ and finally

which is absurd. $\square$

Proposition 3. The field extension $K(\theta)/K$ is Galois, cyclic of degree $p^m$ of $f$, whose Galois group is generated by an extension $\sigma^\ast$ of $\sigma$ such that

Proof. First of all we show that $K(\theta)=E(\theta)$. Indeed, since $K \subset E$, we have $K(\theta) \subset E(\theta)$. However, since $\theta \not \in E$, we must have $K \subset E \subsetneq K(\theta)$. Therefore $p=[E(\theta):K(\theta)][K(\theta):E]$, which forces $E(\theta)$ to be exactly $K(\theta)$.

Let $h(X)$ be the minimal polynomial of $\theta$ over $K$ of degree $p^m$. Then we give an explicit expression of $h$. Notice that since $f(X)$ is the polynomial of $\theta$ over $E$ of degree $p$, we must have $f(X)|h(X)$. For any $k$, we see that $f^{\sigma^k}(X)|g^{\sigma^k}(X)$ too. However, since $\sigma$ fixes $K$, we must have $g^{\sigma^k}(X)=g(X)$, from which it follows that $f^{\sigma^k}(X)|g(X)$ for all $0 \le k \le p^{m-1}-1$. Since the degree of each $f^{\sigma^k}(X)$ is $p$, we obtain

Knowing that $\theta$ is a root of $g$, we see that $\theta+\beta$ is a root of $g(X)$ too because

and by induction we see that for $0 \le k \le p^{m-1}-1$, $f^{\sigma^k}(X)$ has a root in the form

By Artin-Schreier, all roots of $f^{\sigma^k}(X)$ lie in $E(\theta)$ and therefore $h(X)$ splits in $E(\theta)$. Since $E(\theta)/E$ is separable, $E/K$ is separable, we see also $E(\theta)/K$ is separable, which means that $E(\theta)=K(\theta)$ is Galois over $K$.

To see why $K(\theta)/K$ is cyclic, we consider an homomorphism $\sigma^\ast$ of $K(\theta)$ such that $\sigma^{\ast}|_E=\sigma$ and that $\sigma^\ast(\theta)=\theta+\beta$. It follows that $\sigma^\ast \in \operatorname{Gal}(K(\theta)/K)$ because its restriction on $K$, which is the restriction of $\sigma$ on $K$, is the identity. We see then for all $0 \le n \le p^{m}$, one has

In particular,

from which it follows that $(\sigma^\ast)^{p^{m-1}}$ has order $p$, which implies that $\sigma^\ast$ has order $p^m$, thus the Galois group is generated by $\sigma^\ast$. $\square$

References

  • Jean-Pierre Serre, Local Fields (chapter X) (link).
  • Serge Lang, Algebra, chapter VI (link)
🔲 ⭐

Equivalent Conditions of Regular Local Rings of Dimension 1

Introduction

Regular local rings are important objects in modern algebra, number theory and algebraic geometry. Therefore it would be way too ambitious to try to briefly justify the motivation of studying regular local rings. In this post, we try to collect equivalent conditions of being a regular local ring of dimension $1$ and prove them. There are plenty of equivalent conditions and it is difficult to find a book that collects as many as them as possible, let alone giving a detailed proof. The reader is also encouraged to prove the conditions himself, after knowing that the most important tool in the proof is Nakayama’s lemma.

Discrete valuation ring

The reader may have come up with the definition of discrete valuation rings, without knowing the motivation. Indeed, one way to interpret discrete valuation rings is to see them as “Taylor expansions”. The analogy after the definition may explain why.

Definition 1. Let $F$ be a field. A surjective function $F:\mathbb{Z} \to \{\infty\}$ is called a discrete valuation if

  1. $v(\alpha)=\infty \iff \alpha = 0$;
  2. $v(\alpha\beta)=v(\alpha)+v(\beta)$;
  3. $v(\alpha+\beta)\ge\min(v(\alpha),v(\beta))$.

The ring $R_v=\{\alpha \in F:v(\alpha) \ge 0\}$ is called a discrete valuation ring. It is a local ring with maximal ideal $\mathfrak{m}_v=\{\alpha \in F:v(\alpha) > 0\}$.

We should not compare $R_v$ with a polynomial ring, as all polynomial rings are not local. Let $t \in \mathfrak{m}_v$ be an element such that $v(t)=1$. We will show that $\mathfrak{m}_v = (t)$. Indeed, for any $u \in \mathfrak{m}_v$, we see that

and as a result we can write $u=(ut^{-1})t$. If we look further, suppose that $v(u)=m$. Then $\alpha = ut^{-m} \in R_v$ is a unit and thus we have $u=\alpha t^m$. In other words, every element can be expressed as a monomial of $t$.

The analogy or even example to bring about here is the order of zero at origin of (rational) functions over $\mathbb{R}$. For a rational function $F(x)=f(x)/g(x)$, we see that if we define $v(F)=\deg{f}-\deg{g}$, then $\lim_{x\to 0}\frac{F(x)}{x^m}$ is non-zero and finite. The degree of zero polynomial depends on the context, and in our context we make it infinite as no matter how big $m$ is, we are never reaching a point that $\lim_{x \to 0}\frac{0}{x^m}$ is non-zero and finite. Therefore the discrete valuation ring in our story is the polynomials where the function is equivalent to a monomial of positive degree, and the generator of the maximal ideal is the “identity” map. In short, one way of imagining the discrete valuation ring is the space of “smooth” functions at a point that converge to $0$ with the evaluation being the degree of approximation.

Regular local ring of dimension 1

For a ring $R$, we use $\dim(R)$ to denote the Krull dimension and for a vector space $V$ over a field $K$, $\dim_K(V)$ is used to denote the dimension of $V$ as a vector space over $K$.

Theorem 2. Let $R$ be a commutative noetherian local ring with unit and maximal ideal $\mathfrak{m}$ with the residue field $\kappa=R/\mathfrak{m}$. Then the following conditions are equivalent.

  1. $R$ is a discrete valuation ring in its field of fraction;
  2. $\dim_\kappa(\mathfrak{m}/\mathfrak{m}^2)=\dim(R)=1$, i.e., $R$ is a regular local ring of dimension $1$;
  3. $R$ is a unique factorization domain of Krull dimension $1$;
  4. $\mathfrak{m}$ is a principal ideal and $\dim(R)=1$.
  5. $R$ is a principal ideal domain which is not a field;
  6. $R$ is an integrally closed domain of Krull dimension $1$.

(N. B. - We have to assume the axiom of choice by all means, otherwise none of these makes sense. In fact, without assuming the axiom of Choice, it is unprovable that a principal ideal domain has a maximal ideal or the ring has a prime element when it is not a field. See this article for more details.)

Proof. Suppose first that $R$ is a discrete valuation ring with a discrete valuation $v$. Then $\mathfrak{m}=\{a\in R:v(a)>0\}$ is the maximal ideal of $R$ that can be generated by an element $t \in \mathfrak{m}$ such that $v(t)=1$. Let $\mathfrak{a}$ be another ideal of $R$ and let $k=\min v(\mathfrak{a})$. There is an element $x \in \mathfrak{a}$ such that $v(x)=k$ and we can write $x=ut^k$ where $u$ is a unit of $R$. For any other element $y \in \mathfrak{a}$, we have $\ell=v(y)\ge k$ and therefore $y=vt^{\ell}=vu^{-1}t^{\ell-k}ut^{k}=vu^{-1}t^{\ell-k}x$. In other words, we have $\mathfrak{a}=(x)=(t^k)$ for some $k \ge 1$. When $k>1$, the ideal $(t^k)$ is not prime let alone maximal, so we have shown that when $R$ is a discrete valuation ring, the maximal ideal $\mathfrak{m}$ is principal, the Krull dimension of $R$ is $1$ and $R$ is principal but not a field because the maximal ideal is not zero.

This is to say, we have $1 \implies 4,5$. Since a principal ideal domain is also a unique factorization domain, we also get $3$. Besides, we have shown that in all 6 scenarios, the ring $R$ is of Krull dimension $1$. Therefore from now on we assume that $R$ is a commutative noetherian local ring of Krull dimension $1$ a priori. This condition implies that the maximal ideal $\mathfrak{m}$ is not nilpotent because $\mathfrak{m}$ is nilpotent if and only if the dimension of $R$ would be $0$ (hint: Nakayama’s lemma; consider the possibility that $\mathfrak{m}^n=\mathfrak{m}^{n+1}$).

Now assume that $\mathfrak{m}$ is principal and we write $\mathfrak{m}=(t)$ for some $t\in\mathfrak{m}$. For any $a \in R \setminus \{0\}$, if $a$ is invertible, then we can write $a=at^{0}$. Otherwise we have $a\in\mathfrak{m}$ and therefore $a=a_1t$ for some $a_1 \in R\setminus\{0\}$. We show that there exists a unique $n \ge 0$ such that $a = a_n t^n$ where $a_n$ is a unit in $R$.

When $a$ is a unit, as shown above, there is nothing to prove. Therefore, to reach a contradiction, we suppose that such $n$ does not exist when $a$ is not a unit. Then by induction, for each $j>0$, there exists $a_j \in R\setminus\{0\}$ such that $a=a_jt^j$, which means that $a \in (t^j)=\mathfrak{m}^j$ for all $j$. By Krull’s intersection theorem, we have $\bigcap_{j=1}^{\infty}\mathfrak{m}^j=\{0\}$ (this is a consequence of Nakayama’s lemma and Artin-Rees lemma), and therefore $a=0$, which is absurd. Therefore the desired $n$ always exists.

Next we show that such $n$ is unique. Suppose that $a = a_m t^m=a_nt^n$ where $a_m,a_n \in R^\times$ and without loss of generality we assume that $m \ge n$. Then $a - a = (a_mt^{m-n}-a_n)t^n=0$. Since $t$ is not nilpotent, we must have $a_mt^{m-n}-a_n=0$. In this case we must have $m=n$ and $a_m=a_n$ because otherwise $a_mt^{m-n}$ would not be a unit in $R$.

Therefore for all $a\in R \setminus\{0\}$, we can always uniquely write $a = ut^{v(a)}$ where $v(a) \ge 0$ is an integer. Since $t$ is not nilpotent, we see that $R$ is an integral domain and it is a discrete valuation ring in its field of fraction. Besides, $R$ is a principal ideal domain because for any ideal $\mathfrak{a} \subset \mathfrak{m}$, the ideal is generated by the element $a=v^{-1}(\min v(\mathfrak{a}))$.

Next we study the dimension of $\mathfrak{m}/\mathfrak{m}^2$ over $\kappa$, where $\mathfrak{m}=(t)$. Notice that $\dim_\kappa \mathfrak{m}/\mathfrak{m}^2\ge 1$ because otherwise $t=1$ or $0$. We show that $\dim_\kappa\mathfrak{m}/\mathfrak{m}^2 <2$ under the assumption of 4. Let $u,v\in \mathfrak{m}/\mathfrak{m}^2$ be two distinct non-zero vectors. We show that there exists $\alpha \in \kappa$ such that $\alpha u = -v$. Suppose that $u = rt \pmod{\mathfrak{m}^2}$ and $v = st \pmod{\mathfrak{m}^2}$. Then $r,s \not\in \mathfrak{m}$ because otherwise $u=v=0$. If we choose $\alpha = -\frac{s}{r}\pmod{\mathfrak{m}}$, we see that $\alpha u = -st\pmod{\mathfrak{m}^2}=-v$ as desired.

To conclude, we have shown that $4 \implies 1,2,5$.

Moving on, we assume 5 and see what we can get. First of all every principal ideal domain is a unique factorisation ring so we get $3$ (axiom of choice is indispensable here). Besides since every ideal is principal then in particular the maximal ideal is principal so we get $4$. To conclude, we get $5 \implies 3,4$.

Finally we need to study the points 2,3 and 6. To begin with, we assume 3. Then by an elementary verification we see that $R$ is integrally closed (see ProofWiki). Next we show that $\mathfrak{m}$ is principal. Let $\mathscr{P}$ be the family of proper principal ideals of $R$ (they are contained in $\mathfrak{m}$ since $R$ is local). Then the set $\mathscr{P}$ is ordered by inclusion and every chain has a maximal element given by the union. By Zorn’s lemma, in $\mathscr{P}$ there is a maximal element $\mathfrak{M} \in\mathscr{P}$ that contains all proper principal ideals. Next we show that $\mathfrak{M}$ is maximal hence it is equal to $\mathfrak{m}$. To see this, assume that $a \in R \setminus \mathfrak{M}$. Then $(a)$ is not a proper principal ideal of $R$ because otherwise $(a) \subset \mathfrak{M} \implies a \in \mathfrak{M}$. Therefore $a$ is a unit and $\mathfrak{M}$ is the maximal ideal of the local ring $R$, which means $\mathfrak{M}=\mathfrak{m}$. This shows that $3 \implies 4,6$.

Next we assume 2. We use proposition 2 of this old post, only need to notice that the dimension of $\mathfrak{m}/\mathfrak{m}^2$ is exactly the number of generators of $\mathfrak{m}$. Therefore we obtain $2 \implies 4$.

For the last part we assume that $R$ is integrally closed. Choose an arbitrary non-unit $a \in R$. If $a=0$ then $a \in \mathfrak{m}$. Otherwise, consider the ring $\widetilde{R}=R_\mathfrak{m}/aR_\mathfrak{m}$ which is not a field. Then $\tilde{R}$ is of Krull dimension $0$ therefore the maximal ideal $\tilde{\mathfrak{m}}=\mathfrak{m}R_\mathfrak{m}/aR_\mathfrak{m}$, is nilpotent. There exists $n>0$ such that $\tilde{\mathfrak{m}}^n\ne 0$ but $\tilde{\mathfrak{m}}^{n+1}=0$, which implies that $\mathfrak{m}^n \not \subset (a)$ but $\mathfrak{m}^{n+1} \subset (a)$. Choose $b\in (a) \setminus \mathfrak{m}^n$. Then we claim that $\mathfrak{m}=(x)$ where $x=a/b \in K(R)$, the field of fraction of $R$. To see this, notice that $x^{-1}\mathfrak{m} \subset R$ because $b\mathfrak{m} \subset \mathfrak{m}^{n+1} \subset (a)$ so every element of $b\mathfrak{m}$ is of the form $ua$ where $u \in R$ and consequently every element of $\frac{b}{a}\mathfrak{m}$ is of the form $u$ where $u \in R$. Therefore $x^{-1}\mathfrak{m}$ can be considered as an ideal of $R$. However, we also have $x^{-1}\mathfrak{m} \not\subset \mathfrak{m}$ which is because, otherwise, $\mathfrak{m}$, as a finitely generated $R$-module, would be a faithful $R[x^{-1}]$-module, and therefore $x^{-1}$ is integral over $R$, thus lies in $R$. Hence we must have $x^{-1}\mathfrak{m}=R$, which implies that $\mathfrak{m}=(x)$. Therefore we obtain $6 \implies 4$.

We have established all necessary implications to obtain the equivalences. $\square$

🔲 ☆

A Separable Extension Is Solvable by Radicals Iff It Is Solvable

Introduction

Polynomial is of great interest in various fields, such as analysis, geometry and algebra. Given a polynomial, we try to extract as many information as possible. For example, given a polynomial, we certainly want to find its roots. However this is not very realistic. Abel-Ruffini theorem states that it is impossible to solve polynomials of degree $\ge 5$ in general. For example, one can always solve the polynomial $x^n-1=0$ for arbitrary $n$, but trying to solve $x^5-x-1=0$ over $\mathbb{Q}$ is not possible. Galois showed that the flux of solvability lies in the structure of the Galois group, depending on whether it is solvable group-theoretically.

In this post, we will explore the theory of solvability in the modern sense, considering extensions of arbitrary characteristic rather than solely number fields over $\mathbb{Q}$.

Solvable Extensions

Definition 1. Let $E/k$ be a separable and finite field extension, and $K$ the smallest Galois extension of $k$ containing $E$. We say $E/k$ is solvable if $G(K/k)$ (the Galois group of $K$ over $k$) is solvable.

Throughout we will deal with separable extensions because without this assumption one will be dealing with normal extensions instead of Galois extensions. Although we will arrive at a similar result.

Proposition 1. Let $E/k$ be a separable extension. Then $E/k$ is solvable if and only if there exists a solvable Galois extension $L/k$ such that $k \subset E \subset L$.

Proof. If $E/k$ is solvable, it suffices to take $L$ to be the smallest Galois extension of $k$ containing $E$. Conversely, Suppose $L/k$ is a solvable and Galois such that $k \subset E \subset L$. Let $K$ be the smallest Galois extension of $k$ containing $E$, i.e. we have $k \subset E \subset K \subset L$. We see $G(K/k) \cong G(L/k)/G(L/K)$ is a homomorphism image of $G(L/k)$ and it has to be solvable. $\square$

Next we introduce an important concept concerning field extensions.

Definition 2. Let $\mathcal{C}$ be a certain class of extension fields $F \subset E$. We say that $\mathcal{C}$ is distinguished if it satisfies the following conditions:

  1. Let $k \subset F \subset E$ be a tower of fields. The extension $k \subset E$ is in $\mathcal{C}$ if and only if $k \subset F$ is in $\mathcal{C}$ and $F \subset E$ is in $\mathcal{C}$.
  2. If $k \subset E$ is in $\mathcal{C}$ and if $F$ is any given extension of $k$, and $E,F$ are both contained in some field, then $F \subset EF$ is in $\mathcal{C}$ too. Here $EF$ is the compositum of $E$ and $F$, i.e. the smallest field that contains both $E$ and $F$.
  3. If $k \subset F$ and $k \subset E$ are in $\mathcal{C}$ and $F,E$ are subfields of a common field, then $k \subset FE$ is in $\mathcal{C}$.

When dealing with several extensions at the same time, it can be a great idea to consider the class of extensions they are in. For example, Galois extension is not distinguished because normal extension does not satisfy 1. That’s why we need to have the fundamental theorem of Galois theory, a.k.a. Galois correspondence, because not all intermediate subfields are Galois. Separable extension is distinguished however. We introduce this concept because:

Proposition 2. Solvable extensions form a distinguished class of extensions. (N.B. these extensions are finite and separable by default.)

Proof. We verify all three conditions mentioned in definition 2. To make our proof easier however, we first verify 2.

Step 1. Let $E/k$ be solvable. Let $F$ be a field containing $k$ and assume $E, F$ are subfields of some algebraically closed field. We need to show that $EF/F$ is solvable. By proposition 1, there is a Galois solvable extension $K/k$ such that $K \supset E \supset k$. Then $KF$ is Galois over $F$ and $G(KF/F)$ is a subgroup of $G(K/k)$. Therefore $KF/F$ is a Galois solvable extension and we have $KF \supset EF \supset F$, which implies that $EF/F$ is solvable.

Step 2. Consider a tower of extensions $E \supset F \supset k$. Assume now $E/k$ is solvable. Then there exists a Galois solvable extension $K$ containing $E$, which implies that $F/k$ is solvable because $K \supset F$. We see $E/F$ is also solvable because $EF=E$ and we are back to step 1.

Conversely, assume that $E/F$ is solvable and $F/k$ is solvable. We will find a solvable extension $M/k$ containing $E$. Let $K/k$ be a Galois solvable extension such that $K \supset F$, then $EK/K$ is solvable by step 1. Let $L$ be a Galois solvable extension of $K$ containing $EK$. If $\sigma$ is any embedding of $L$ over $k$ in a given algebraic closure, then $\sigma K = K$ and hence $\sigma L$ is a solvable extension of $K$. [This sentence deserves some explanation. Notice that $L/k$ is not necessarily Galois, therefore $\sigma$ is not necessarily an automorphism of $L$ and $\sigma L \ne L$ in general . However, since $K/k$ is Galois, the restriction of $\sigma$ on $K$ is an automorphism so therefore $\sigma K = K$. The extension $\sigma L / \sigma K$ is solvable because $\sigma L$ is isomorphic to $L$ and $\sigma K = K$.]

We let $M$ be the compositum of all extensions $\sigma L$ for all embeddings $\sigma$ of $L$ over $k$. Then $M/k$ is Galois and so is $M/K$ [note: this is the property of normal extension; besides, $M/k$ is finite]. We have $G(M/K) \subset \prod_{\sigma}G(\sigma L/K)$ which is a product of solvable groups. Therefore $G(M/K)$ is solvable, meaning $M/K$ is a solvable extension. We have a surjective homomorphism $G(M/k) \to G(K/k)$ (given by $\sigma \mapsto \sigma|_K$) and therefore $G(M/k)$ has a normal subgroup whose factor group is solvable, meaning $G(M/k)$ is solvable. Since $E \subset M$, we are done.

Step 3. If $F/k$ and $E/k$ are solvable and $E,F$ are subfields of a common field, we need to show that $EF$ is solvable over $k$. By step 1, $EF/F$ is solvable. By step 2, $EF/k$ is solvable. $\square$

Solvable By Radicals

Definition 2. Let $F/k$ be a finite and separable extension. We say $F/k$ is solvable by radicals if there exists a finite extension $E$ of $k$ containing $F$, and admitting a tower decomposition

such that each step $E_{i+1}/E_i$ is one of the following types:

  1. It is obtained by adjoining a root of unity.
  2. It is obtained by adjoining a root of a polynomial $X^n-a$ with $a_i \in E_i$ and $n$ prime to the characteristic.
  3. It is obtained by adjoining a root of an equation $X^p-X-a$ with $a \in E_i$ if $p$ is the characteristic $>0$.

For example, $\mathbb{Q}(\sqrt{-2})/\mathbb{Q}$ is solvable by radicals. We consider the polynomial $f(x)=x^2-2x+3$. We know its roots are $x_1=-1-\sqrt{-2}$ and $x_2=-1+\sqrt{-2}$. However let’s see the question in the sense of field theory. Notice that

Therefore $f(x)=0$ is equivalent to $(x-1)^2=-2$. Then $x-1=\sqrt{-2}$ and $x-1=-\sqrt{-2}$ in $\mathbb{Q}(\sqrt{-2})$ are two equations that make perfect sense. Thus we obtain our desired roots. The field gives us the liberty of basic arithmetic, and the radical extension gives us the method to look for a radical root.

It is immediate that the class of extensions solvable by radicals is a distinguished class.

In general, we are adding “$n$-th root of something”. However, when the characteristic of the field is not zero, there are some complications. For example, talking about the $p$-th root of an element in a field of characteristic $p>0$ will not work. Therefore we need to take good care of that. The second and third types are nods to Kummer theory and Artin-Schreier theory respectively, which are deduced from Hilbert’s theorem 90’s additive and multiplicative form. We interrupt the post by introducing the respective theorems.


Let $K/k$ be a cyclic extension of degree $n$, that is, $K/k$ is Galois and $G(K/k)$ is cyclic. Suppose $G(K/k)$ is generated by $\sigma$. Then we have the celebrated “Theorem 90”:

Theorem 1 (Hilbert’s theorem 90, multiplicative form). Notation being above, let $\beta \in K$. The norm $N_{k}^{K}(\beta)=1$ if and only if there exists an element $\alpha \ne 0$ in $K$ such that $\beta = \alpha/\sigma\alpha$.

To prove this, we need Artin’s theorem of independent characters. With this, we see the second type of extension in definition 2 is cyclic.

Theorem 2. Let $k$ be a field, $n$ an integer $>0$ prime to the characteristic of $k$, and assume that there is a primitive $n$-th root of unity in $k$.

  1. Let $K$ be a cyclic extension of degree $n$. Then there exists $\alpha \in K$ such that $K = k(\alpha)$ and $\alpha$ satisfies an equation $X^n-a=0$ for some $a \in k$.
  2. Conversely, let $a \in k$. Let $\alpha$ be a root of $X^n-a$. Then $k(\alpha)$ is cyclic over $k$ of degree $d|n$, and $\alpha^d$ is an element of $k$.

All in all, theorem 2 states that a $n$-th root of $a$ yields a cyclic extension. However we don’t drop the assumption that $n$ is prime to the characteristic of $k$. When this is not the case, we will use Artin-Schreier theorem.

Theorem 3 (Hilbert’s theorem 90, additive form). Let $K/k$ be a cyclic extension of degree $n$. Let $\sigma$ be the generator of $G(K/k)$. Let $\beta \in K$. The trace $\mathrm{Tr}_k^K(\beta)=0$ if and only if there exists an element $\alpha \in K$ such that $\beta = \alpha-\sigma\alpha$.

This theorem requires another application of the independence of characters.

Theorem 4 (Artin-Schreier). Let $k$ be a field of characteristic $p$.

  1. Let $K$ be a cyclic extension of $k$ of degree $p$. Then there exists $\alpha \in K$ such that $K=k(\alpha)$ and $\alpha$ satisfies an equation $X^p-X-a=0$ with some $a \in k$.
  2. Conversely, given $a \in k$, the polynomial $f(X)=X^p-X-a$ either has one root in $k$, in which case all its roots are in $k$, or it is irreducible. In the latter case, if $\alpha$ is a root then $k(\alpha)$ is cyclic of degree $p$ over $k$.

In other words, instead of looking at the $p$-th root of unity in a field of characteristic $p$, we look at the root of $X^p-X-a$, which still yields a cyclic extension.


Now we are ready for the core theorem of this post.

Theorem 5. Let $E$ be a finite separable extension of $k$. Then $E$ is solvable by radicals if and only if $E/k$ is solvable.

Proof. First of all we assume that $E/k$ is solvable. Then there exists a finite Galois solvable extension of $k$ containing $E$ and we call it $K$. Let $m$ be the product of all primes $l$ such that $l \ne \operatorname{char}k$ but $l|[K:k]$. Let $F=k(\zeta)$ where $\zeta$ is a primitive $m$-th root of unity. Then $F/k$ is abelian and is solvable by radical by definition.

Since solvable extensions form a distinguished class, we see $KF/F$ is solvable. There is a tower of subfields between $F$ and $KF$ such that each step is cyclic of prime order, because every solvable group admits a tower of cyclic groups, and we can use Galois correspondence. By theorem 2 and 4, we see $KF/F$ is solvable by radical because extensions of prime order have been determined by these two theorems. It follows that $E/k$ is solvable by radicals: $KF/F$ is solvable by radicals, $F/k$ is solvable by radicals $\implies$ $KF/k$ is solvable by radicals $\implies$ $E/k$ is solvable by radicals because $KF \supset E \supset k$.


The elaboration of the “if” part is as follows. In order to prove $E/k$ is solvable by radicals, we show that there is a much bigger field $KF$ containing $E$ such that $KF/k$ is solvable by radical. First of all there exists a finite Galois solvable extension $K/k$ containing $E$. Next we define a cyclotomic extension $F/k$ with the following intentions

  1. $F/k$ should be solvable by radicals.
  2. $F$ contains enough primitive roots of unity, so that we can use theorem 2 freely.

To reach these two goals, we decide to put $F=k(\zeta)$ where $\zeta$ is a $m$-th root of unity and $m$ is the radical of $[K:k]$ divided by the characteristic of $k$ when necessary. This field $F$ certainly ensures that $F/k$ is solvable by radical. For the second goal, we need to take a look of the subfield between $F$ and $KF$. Let $k = K_0 \subset K_1 \subset \dots \subset K_n = K$ be a tower of field extensions such that every step $K_{i+1}/K_i$ is of prime degree [this is possible due to the solvability of $K/k$]. These prime numbers can only be factors of $[K:k]$ Then in the lifted field extension $F=K_0F \subset K_1F \subset \dots \subset K_nF=KF$ we do not introduce new prime numbers. Why do we consider prime factors of $[K:k]$? Let’s say $[K_{i+1}F:K_iF] = \ell$ is a prime number. If $\ell=\operatorname{char}k$ then we can use theorem 4. Otherwise we still have $\ell|[K:k]$ so we use theorem 2. However this theorem requires a primitive $\ell$-th root to be in $K_{i}F$. Our choice of $m$ and $\zeta$ guaranteed this to happen because $\ell|m$ and therefore a primitive $\ell$-th root of unity exists in $F$. We can make $m$ bigger but there is no necessity. The “only if” part does nearly the same thing, with an alternation of logic chain.


Conversely, assume that $E/k$ is solvable by radicals. For any embedding $\sigma$ of $E$ in $E^{\mathrm{a}}$ over $k$, the extension $\sigma E/k$ is also solvable by radicals. Hence the smallest Galois extension $K$ of $E$ containing $k$, which is a composite of $E$ and its conjugates is solvable by radicals. Let $m$ be the product of all primes unequal to the characteristic dividing the degree $[K:k]$ and again let $F=k(\zeta)$ where $\zeta$ is a primitive $m$-th root of unity. It will suffice to prove that $KF$ is solvable over $F$, because it follows that $KF$ is solvable by $k$ and hence $G(K/k)$ is solvable because it is a homomorphic image of $G(KF/k)$. But $KF/F$ can be decomposed into a tower of extensions such that each step is prime degree and of the type described in theorem 2 and theorem 4. The corresponding root of unity is in the field $F$. Hence $KF/F$ is solvable, proving the theorem. $\square$

🔲 ☆

Picard's Little Theorem and Twice-Punctured Plane

Introduction

Let $f:\mathbb{C} \to \mathbb{C}$ be a holomorphic function. By Liouville’s theorem, if $f(\mathbb{C})$ is bounded, then $f$ has to be a constant function. However, there is a much stronger result. In fact, if $f(\mathbb{C})$ differs $\mathbb{C}$ from exactly $2$ points, then $f$ is a constant. In other words, suppose $f$ is non-constant, then the equation $f(z)=a$ for all $a \in \mathbb{C}$ except at most one $a$. To think about this, if $f$ is a non-constant polynomial, then $f(z)=a$ always has a solution (the fundamental theorem of algebra). If, for example, $f(z)=\exp(z)$, then $f(z)=a$ has no solution only if $a=0$.

The proof will not be easy. It will not be proved within few lines of obvious observations, either in elementary approaches or advanced approaches. In this post we will follow the later by studying the twice-punctured plane $\mathbb{C} \setminus\{0,1\}$. To be specific, without loss of generality, we can assume that $0$ and $1$ are not in the range of $f$. Then $f(\mathbb{C}) \subset \mathbb{C}\setminus\{0,1\}$. Next we use advanced tools to study $\mathbb{C}\setminus\{0,1\}$ in order to reduce the question to Liouville’s theorem by constructing a bounded holomorphic function related to $f$.

We will find a holomorphic covering map $\lambda:\mathfrak{H} \to \mathbb{C}\setminus\{0,1\}$ and then replace $\mathfrak{H}$ with the unit disc $D$ using the Cayley transform $z \mapsto \frac{z-i}{z+i}$. Then the aforementioned $f$ will be lifted to a holomorphic function $F:\mathbb{C} \to D$, which has to be constant due to Liouville’s theorem, and as a result $f$ is constant.

With these being said, we need analytic continuation theory to establish the desired $\lambda$, and on top of that, (algebraic) topology will be needed to justify the function $F$.

Analytic Continuation

For a concrete example of analytic continuation, I recommend this post on the Riemann $\zeta$ function. In this post however, we only focus on the basic language of it in order that we can explain later content using analytic continuation.

Our continuation is always established “piece by piece”, which is the reason we formulate continuation in the following sense.

Definition 1. A function element is an ordered pair $(f,D)$ where $D$ is an open disc and $f \in H(D)$. Two function elements $(f_1,D_1)$ and $(f_2,D_2)$ are direct continuation of each other if $D_1 \cap D_2 \ne \varnothing$ and $f_1=f_2$ on $D$. In this case we write

The notion of ordered pair may ring a bell of sheaf and stalk. Indeed some authors do formulate analytic continuation in this language, see for example Principles of Complex Analysis by Serge Lvovski.

The $\sim$ relation is by definition reflective and symmetric, but not transitive. To see this, let $\omega$ be the primitive $3$-th root of unity. Let $D_0, D_1,D_2$ be open discs with radius $1$ and centres $\omega^0,\omega^1,\omega^2$. Since the $D_i$ are simply connected, we can always pick $f_i \in H(D_i)$ such that $f_i^2(z)=z$, and $(f_0,D_0) \sim (f_1,D_1)$ and $(f_1,D_1) \sim (f_2,D_2)$ but on $D_0 \cap D_2$ one has $f_2 =-f_0 \ne f_0$. Indeed there is nothing mysterious: we are actually rephrasing the fact that square root function cannot be defined at a region containing $0$.

Definition 2. A chain is a finite sequence $\mathscr{C}$ of discs $(D_0,D_1,\dots,D_n)$ such that $D_{i-1} \cap D_i \ne \varnothing$ for $i=1,\dots,n$. If $(f_0,D_0)$ is given and if there exists function elements $(f_i,D_i)$ such that $(f_{i-1},D_{i-1}) \sim (f_i,D_i)$ for $i=1,\dots,n$, then $(f_n,D_n)$ is said to be the analytic continuation of $(f_0,D_0)$ along $\mathscr{C}$.

A chain $\mathscr{C}=(D_0,\dots,D_n)$ is said to cover a curve $\gamma$ with parameter interval $[0,1]$ if there are numbers $0=s_0<s_1<\dots<s_n=1$ such that $\gamma(0)$ is the centre of $D_0$, $\gamma(1)$ is the centre of $D_n$, and

If $(f_0,D_0)$ can be continued along this $\mathscr{C}$ to $(f_n,D_n)$, we call $(f_n,D_n)$ an analytic continuation of $(f_0,D_0)$ along $\gamma$; $(f_0,D_0)$ is then said to admit an analytic continuation along $\gamma$.

Either way, it is not necessary that $(f_0,D_0) \sim (f_n,D_n)$. However, unicity of $(f_n,D_n)$ is always guaranteed. We will sketch out the proof on unicity.

Lemma 1. Suppose that $D_0 \cap D_1 \cap D_2 \ne \varnothing$, $(D_0,f_0) \sim (D_1,f_1)$ and $(D_1,f_1) \sim (D_2,f_2)$, then $(D_0,f_0) \sim (D_2,f_2)$.

Proof. By assumption, $f_0=f_1$ in $D_0 \cap D_1$, and $f_1=f_2$ in $D_1 \cap D_2$. It follows that $f_0=f_2$ in $D_0 \cap D_1 \cap D_2$, which is open and non-empty. Since $f_0$ and $f_2$ are holomorphic in $D_0 \cap D_2$ and $D_0 \cap D_2$ is connected, we have $f_0 = f_2$ in $D_0 \cap D_2$. This is because on a open connected set $D_0 \cap D_2$, the zero of $f_0-f_2$ is not discrete. Therefore $f_0-f_2$ has to be $0$ everywhere on $D_0 \cap D_2$. $\square$

Theorem 1. If $(f,D)$ is a function element and $\gamma$ is a curve which starts at the centre of $D$, then $(f,D)$ admits at most one analytic continuation along $\gamma$.

Sketch of the proof. Let $\mathscr{C}_1=(A_0,A_1,\dots,A_m)$ and $\mathscr{C}_2=(B_0,B_1,\dots,B_n)$ be two chains that cover $\gamma$. If $(f,D)$ can be analytically continued along $\mathscr{C}_1$ to a function element $(g_m,A_m)$ and along $\mathscr{C}_2$ to $(h_n,B_n)$, then $g_m=h_n$ in $A_m \cap B_n$.

We are also given partitions $0=s_0<s_1<\dots<s_m=s_{m+1}=1$ and $0=t_0<t_1<\dots<t_n=t_{n+1}=1$ such that

and function elements $(g_i,A_i) \sim (g_{i+1},A_{i+1})$ and $(h_j,B_j) \sim (h_{j+1},B_{j+1})$ for $0 \le i \le m-1$ and $0 \le j \le n-1$ with $g_0=h_0=f$. The poof is established by showing that the continuation is compatible with intersecting intervals, where lemma 1 will be used naturally. To be specific, if $0 \le i \le m$ and $0 \le j \le n$, and if $[s_i,s_{i+1}] \cap [t_j,t_{j+1}] \ne \varnothing$, then $(g_i, A_i) \sim (h_j,B_j)$.

The Monodromy Theorem

The monodromy theorem asserts that on a simply connected region $\Omega$, for a function element $(f,D)$ with $D \subset \Omega$, we can extend it to all $\Omega$ if $(f,D)$ can be continued along all curves. To prove this we need homotopy properties of analytic continuation and simply connected spaces.

Definition 1. A simply connected space is a path connected topological space $X$ with trivial fundamental group $\pi_1(X,x_0)=\{e\}$ for all $x_0 \in X$.

The following fact is intuitive and will be used in the monodromy theorem.

Lemma 2. Let $X$ be a simply connected space and let $\gamma_1$ and $\gamma_2$ be two closed curves $[0,1] \to X$ with $\gamma_1(0)=\gamma_2(0)$ and $\gamma_1(1)=\gamma_2(1)$. Then $\gamma_1$ and $\gamma_2$ are homotopic.

Proof. Let $\gamma_i^{-1}$ be the curve defined by $\gamma_i^{-1}(t)=\gamma_i(1-t)$ for $i=1,2$. Then

where $e$ is the identity of $\pi_1(X,\gamma_1(0))$. $\square$

Next we prove the two-point version of the monodromy theorem.

Monodromy theorem (two-point version). Let $\alpha,\beta$ be two points on $\mathbb{C}$ and $(f,D)$ be a function element where $D$ is centred at $\alpha$. Let $\{\gamma_t\}$ be a homotopy class indexed by a map $H(s,t):[0,1] \times [0,1] \to \mathbb{C}$ with the same origin $\alpha$ and terminal $\beta$. If $(f,D)$ admits analytic continuation along each $\gamma_t$, to an element $(g_t,D_t)$, then $g_1=g_0$.

In brief, analytic continuation is faithful along homotopy classes. By being indexed by $H(s,t)$ we mean that $\gamma_t(s)=H(s,t)$. We need the uniform continuity of $H(s,t)$.

Proof. Fix $t \in [0,1]$. By definition, there is a chain $\mathscr{C}=(A_0,\dots,A_n)$ which covers $\gamma_t$, with $A_0=D$, such that $(g_t,D_t)$ is obtained by continuation of $(f,D)$ along $\mathscr{C}$. There are numbers $0=s_0<\dots<s_n=1$ such that

For each $i$, define

The $d_i$ makes sense and is always positive, because $E_i$ is always compact and $A_i$ is an open set. Then pick any $\varepsilon \in (0,\min_i\{d_i\})$. Since $H(s,t)$ is uniformly continuous, there exists a $\delta>0$ such that

We claim that $\mathscr{C}$ also covers $\gamma_u$. To do this, pick any $s \in [s_i,s_{i+1}]$. Then $\gamma_u(s) \in A_i$ because

Therefore by theorem 1, we have $g_t=g_u$. Notice that for any $t \in [0,1]$, there is a segment $I_t$ such that $g_u=g_t$ for all $u \in [0,1] \cap I_t$. Since $[0,1]$ is compact, there are finitely many $I_t$ that cover $[0,1]$. Since $[0,1]$ is connected, we see, after a finite number of steps, we can reach $g_0=g_1$. $\square$

Momodromy theorem. Suppose $\Omega$ is a simply connected open subset of the plane, $(f,D)$ is a function element with $D \subset \Omega$, and $(f,D)$ can be analytically continued along every curve in $\Omega$ that starts at the centre of $D$. Then there exists $g \in H(\Omega)$ such that $g(z)=f(z)$ for all $z \in D$.

Proof. Let $\gamma_0$ and $\gamma_1$ be two curves in $\Omega$ from the centre $\alpha$ of $D$ to some point $\beta \in \Omega$. Then the two-point monodromy theorem and lemma 2 ensures us that these two curves lead to the same element $(g_\beta,D_\beta)$, where $D_\beta \subset \Omega$ is a circle with centre at $\beta$. If $D_{\beta_1}$ intersects $D_\beta$, then $(g_{\beta_1},D_{\beta_1})$ can be obtained by continuing $(f,D)$ to $\beta$, then along the segment connecting $\beta$ and $\beta_1$. By definition of analytic continuation, $g_{\beta_1}=g_\beta$ in $D_{\beta_1} \cap D_\beta$. Therefore the definition

is a consistent definition and gives the desired holomorphic extension of $f$. $\square$

Modular Function

Let $\mathfrak{H}$ be the open upper half plane. We will find a function $\lambda \in H(\mathfrak{H})$ whose image is $E=\mathbb{C} \setminus\{0,1\}$ and is in fact the (holomorphic) covering space of $E$. The function $\lambda$ is called a modular function.

As usual, consider the action of $G=SL(2,\mathbb{Z})$ on $\mathfrak{H}$ given by

Definition 2. A Modular function is a holomorphic (or meromorphic) function $f$ on $\mathfrak{H}$ which is invariant under a non-trivial subgroup $\Gamma$ of $G$. That is, for any $\varphi \in \Gamma$, one has $f \circ \varphi=f$.

In this section, we consider this subgroup:

It has a fundamental domain

Basically, $Q$ is bounded by two vertical lines $x=1$ and $x=-1$ vertically, and two semicircles with centre at $x=\frac{1}{2}$ and $x=-\frac{1}{2}$ with diameter $1$, but only the left part contains boundary points. The term fundamental domain will be justified by the following theorem.

Theorem 4. Let $\Gamma$ and $Q$ be as above.

(a) Let $\varphi_1,\varphi_2$ be two distinct elements of $\Gamma$, then $\varphi_1(Q) \cap \varphi_2(Q) = \varnothing$.

(b) $\bigcup_{\varphi \in \Gamma}\varphi(Q)=\mathfrak{H}$.

(c) $\Gamma$ is generated by two elements

Sketch of the proof. Let $\Gamma_1$ be the subgroup of $\Gamma$ generated by $\sigma$ and $\tau$, and show (b’):

Then (a) and (b’) would imply that $\Gamma_1=\Gamma$ and (b) is proved. To prove (a), one will replace $\varphi_1$ with the identity element and discuss the relationship between $c$ and $d$ for $\varphi_2=\begin{pmatrix}a & b \\ c & d \end{pmatrix}$. To prove (b’), one need to notice that

For $w \in \mathfrak{H}$, by picking $\varphi_0 \in \Gamma$ that maximises $\Im\varphi_0(w)$, only to show that $z=\varphi_0(w) \in \Sigma$ and therefore $w \in \Sigma$.


We are now allowed to introduce the modular function.

Theorem 5. Notation being above, there exists a function $\lambda \in H(\mathfrak{H})$ such that

(a) $\lambda \circ \varphi = \lambda$ for every $\varphi \in \Gamma$.

(b) $\lambda$ is one-to-one on $Q$.

(c) $\lambda(\mathfrak{H})=\lambda(Q)=E=\mathbb{C}\setminus\{0,1\}$.

(d) $\lambda$ has the real axis as its natural boundary. That is, $\lambda$ has no holomorphic extension to any region that properly contains $\mathfrak{H}$.

Proof. Consider

This is a simply connected region with simple boundary. There is a continuous function $h$ which is one-to-one on $\overline{Q}_0$ and is holomorphic in $Q_0$ such that $h(Q_0)=\mathfrak{H}$, $h(0)=0$, $h(1)=1$ and $h(\infty)=\infty$. This is a consequence of conformal mapping theory.

The Schwartz reflection principle extends $h$ to a continuous function on $\overline{Q}$ which is a conformal mapping of $Q^\circ$ (the interior of $Q$) onto the plane minus the non-negative real axis, by the formula

Note the extended $h$ is one-to-one on $Q$, and $h(Q)$ is $E$ defined in (c).

On the boundary of $Q$, the function $h$ is real. In particular,

and that

We now define

for $\varphi \in \Gamma$ and $z \in \varphi(Q)$. This definition makes sense because for each $z \in \mathfrak{H}$, there is one and only one $\varphi \in \Gamma$ such that $z \in \varphi(Q)$. Properties (a) (b) and (c) follows immediately.

Notice $\lambda$ is continuous on

and therefore on an open set $V$ containing $Q$. Cauchy’s theorem shows that $\lambda$ is holomorphic in $V$. Since $\mathfrak{H}$ is covered by the union of the sets $\varphi(V)$ for $\varphi \in \Gamma$, and since $\lambda \circ \varphi = \lambda$, we conclude that $\lambda \in H(\mathfrak{H})$.

Finally, the set of all numbers $\varphi(0)=b/d$ is dense on the real axis. If $\lambda$ could be analytically continued to a region which properly contains $\mathfrak{H}$, the zeros of $\lambda$ would have a limit point in this region, which is impossible since $\lambda$ is not constant. $\square$

We are now ready for the pièce de résistance of this post.

Picard’s Little Theorem

Theorem (Picard). If $f$ is an entire function and if there are two distinct complex numbers $\alpha$ and $\beta$ such that are not in the range of $f$, then $f$ is constant.

The proof is established by considering an analytic continuation of a function $g$ associated with $f$. The continuation will be originated at the origin and validated by monodromy theorem. Then by Cayley’s transformation, we find out the range of $g$ is bounded and hence $g$ is constant, so is $f$.

Proof. First of all notice that without loss of generality, we assume that $\alpha=0$ and $\beta=1$, because otherwise we can replace $f$ with $(f-\alpha)/(\beta-\alpha)$. That said, the range of $f$ is $E$ in theorem 5. There is a disc $A_0$ with centre at $0$ so that $f(A_0)$ lies in a disc $D_0 \subset E$.

For every disc $D \subset E$, there is an associated region $V \subset \mathfrak{H}$ such that $\lambda$ in theorem 5 is one-to-one on $V$ and $\lambda(V)=D$; each such $V$ intersects at most two of the domains $\varphi(Q)$. Corresponding to each choice of $V$, there is a function $\psi \in H(D)$ such that $\psi(\lambda(z))=z$ for all $z \in V$.

Now let $\psi_0 \in H(D_0)$ be the function such that $\psi_0(\lambda(z))=z$ as above. Define $g(z)=\psi_0(f(z))$ for $z \in A_0$. We claim that $g(z)$ can be analytically continued to an entire function.

If $D_1$ is another disc in $E$ with $D_0 \cap D_1 \ne \varnothing$, we can choose a corresponding $V_1$ so that $V_0 \cap V_1 \ne \varnothing$. Then $(\psi_0,D_0)$ and $(\psi_1,D_1)$ are direct analytic continuations of each other. We can proceed this procedure all along to find a direct analytic continuation $(\psi_{i+1},D_{i+1})$ of $(\psi_i,D_i)$ with $V_{i+1} \cap V_i \ne 0$. Note $\psi_i(D_i) \subset V_i \subset \mathfrak{H}$ for all $i$.

Let $\gamma$ be a curve in the plane which starts at $0$. The range of $f \circ \gamma$ is a compact subset of $E$ and therefore $\gamma$ can be covered by a chain of discs, say $A_0,\dots,A_n$, so that each $f(A_i)$ is in a disc $D_i \subset E$. By considering function elements $\{(\psi_{i},D_i)\}$, composing with $f$ on each $D_i$ (this is safe because $f$ is entire), we get an analytic continuation of $(g,A_0)$ along the chain $(A_0,\dots,A_n)$. Note $\psi_i \circ f(A_i) \subset \psi_i(D_i) \subset \mathfrak{H}$ again.

Since $\gamma$ is arbitrary, we have shown that $(g,A_0)$ can be analytically continued along every curve in the plane. The monodromy theorem implies that $g$ extends to an entire function. Thus proving our claim given before.

Note the range of the extended $g$ on every possible $A_i$ has range lying inside $\mathfrak{H}$. Therefore $g(\mathbb{C}) \subset \mathfrak{H}$. It follows that

has range in the unit disc. By Liouville’s theorem, $h$ is a constant function. Thus $g$ is constant too.

Now we move back to $f$ by looking at $A_0$. Since $\psi_0$ is one-to-one on $f(A_0)$ and $A_0$ is not empty and open, $f(A_0)$ has to be a singleton. Thus $f$ is constant on $A_0$. If we represent $f$ as a power series on a disc lying inside $A_0$, we see $f$ has to be a constant. $\square$

Note we have also seen that the range of a non-constant function cannot be half of a plane. But this result is useless because we can find two points on a large chunk of a plane after all.

Reference

  • Walter Rudin, Real and Complex Analysis.
  • Tammo tom Dieck, Algebraic Topology.
🔲 ☆

SL(2,R) As a Topological Space and Topological Group

Introduction

There are a lot of important linear algebraic groups that are widely used in mathematics, physics and industry. Some of them have nice visualisations. For example, it is widely known that $SU(2) \cong S^3$ and $SO(3) \cong \mathbb{RP}^3$. The group $SL(2,\mathbb{R})$ is not less important than them but the visualisation concerning this group is not very easy to be found. In this post we show that

where $D$ is the open unit disk. In other words, $SL(2,\mathbb{R})$ can be considered as a donut, not the shell of it ($S^1 \times S^1$) but the “content” or “flesh” of it. More formally, the inside of a solid torus.

The related core theory can be found in Iwasawa decomposition, but to access it we need Lie group and Lie algebra theories, which involves differential geometry and certainly goes beyond the scope of this post. Interested readers can refer to Lie Groups Beyond an Introduction chapter 6 for Iwasawa decomposition theory.

Immediate topological consequences

Before we establish the homeomorphism

we first see what we can derive from it.

  • Is $SL(2,\mathbb{R})$ compact?

No. Since $D$ is not compact, $S^1 \times D$ cannot be compact.

  • What is the fundamental group of $SL(2,\mathbb{R})$?

Notice there is a (strong) deformation retract between $S^1 \times D$ and $S^1$. Therefore $\pi_1(SL(2,\mathbb{R})) = \pi_1(S^1)=\mathbb{Z}$.

  • Connectedness of $SL(2,\mathbb{R})$?

It is connected because $S^1$ and $D$ are connected. It is not simply connected because the fundamental group is not trivial.

  • What is the dimension of $SL(2,\mathbb{R})$ as a manifold?

The dimension is $3$.

The Iwasawa decomposition

If we directly jump to the conclusion without mentioning Lie theory, one will see the decomposition comes from nowhere. Instead of defining $K$, $A$ and $N$ that will appear later and show that there is no discrepancy, we deduce the decomposition without the usage of Lie theory. Instead, we consider the action of $SL(2,\mathbb{R})$ on the upper half plane, because group action is likely to expose more information of the group.

Consider the group action of $SL(2,\mathbb{R})$ on the upper half plane

given by

Up to an explosion of calculation, one can indeed verify that this is a group action and in particular

As one may guess, it is not wise to continue without investigating the action first, or we will be lost in calculation. We first show that this action is transitive by showing that for any $z=x+yi \in \mathfrak{H}$, there is some $\sigma \in SL(2,\mathbb{R})$ such that $\sigma(z)=i$:

Let’s play around the last linear equation system:

We can put $c=0$ and $a=\frac{1}{\sqrt{y}}$ so that $b=-\frac{x}{\sqrt{y}}$ and $d=\sqrt{y}$. That is,

We have therefore proved:

The action of $SL(2,\mathbb{R})$ on $\mathfrak{H}$ is transitive.

Proof. For any $z,z’ \in \mathfrak{H}$, there exists $\sigma$ and $\sigma’$ such that $\sigma(z)=i$ and $\sigma’(z’)=i$. Then $\sigma’^{-1}(\sigma(z))=z’$, i.e. $\sigma’^{-1}\sigma$ sends $z$ to $z’$. $\square$

By working around $i$ on $\mathfrak{H}$ we can save ourselves from a lot of troubles. It is then desirable to find the stabiliser of $i$.

The stabiliser of $i \in \mathfrak{H}$ is $SO(2) \cong S^1$.

Proof. Suppose $\sigma=\begin{pmatrix} a & b \\c & d \end{pmatrix}$ stabilises $i$. Then first of all we have

Then

It follows that

Therefore $\sigma \in O(2) \cap SL(2) = SO(2)$ as expected. $\square$

With these being said, the action of $SL(2,\mathbb{R})$ on $i$ consists of $SO(2)$ that moves nothing and the rest that actually move things. In other words, $SL(2,\mathbb{R})/SO(2) \cong \mathfrak{H}$ as a $2$-manifold. In particular, the action is a isometry. We will find the effective part of the group action out. For $\sigma \in SL(2,\mathbb{R})$, we assume that $\sigma(i)=x+iy$. Then

Let $B$ be the upper triangular matrices in $SL(2,\mathbb{R})$ with positive diagonal elements. Then it is elements in $B$ that actually move things. According to this classification, we have obtained a decomposition

The matrix multiplication map $B \times SO(2) \to SL(2,\mathbb{R})$ is surjective.

Proof. Notice that every element of $B$ can be written in the form

For any $\sigma \in SL(2,\mathbb{R})$, suppose $\sigma(i)=x+iy$, then $\sigma(i)=\lambda_{x,y}(i)$, therefore $\lambda_{x,y}^{-1}\sigma(i)=i$, i.e. $\lambda_{x,y}^{-1}\sigma \in SO(2)$, i.e. $\lambda_{x,y}^{-1}\sigma$ is a stabiliser of $i$. The product $\sigma = \lambda_{x,y}(\lambda_{x,y}^{-1}\sigma)$ always lies in the image of $B \times SO(2)$. $\square$

We can decompose $B$ further:

Let $N$ be the group of upper triangular matrices in $SL(2,\mathbb{R})$ with $1$ on the diagonal line and let $A$ be the group of diagonal matrices with non-negative entries. Then $B=NA$. Let $K=SO(2) \subset SL(2,\mathbb{R})$, then we have obtained the so-called Iwasawa decomposition:

There is a diffeomorphism onto

Proof. It only remains to show injectivity. Suppose $n_1a_1k_1=n_2a_2k_2$. Applying both sides onto $i$ we obtain $n_1a_1(i)=n_2a_2(i)$. Suppose

Then we have $n_1a_1(i)=x_1+y_1i=n_2a_2(i)=x_2+y_2i$. It follows that $x_1=x_2$ and $y_1=y_2$, i.e. $n_1=n_2$ and $a_1=a_2$ and therefore $k_1=k_2$. $\square$

By investigating $N$ and $A$ further we obtain

The group $SL(2,\mathbb{R})$ is homeomorphic to $S^1 \times D$.

Proof. Notice that $N$ is homeomorphic to $\mathbb{R}$ and $A$ is homeomorphic to $\mathbb{R}_{>0}\cong \mathbb{R}$. $\square$

Notice the order of $N,A,K$ does not matter very much: $NAK,KAN,ANK,KNA$ are the same thing. This is because $AN=NA$ and for $nak \in SL(2,\mathbb{R})$, we have $(nak)^{-1}=k^{-1}a^{-1}n^{-1}$ which lies in the preimage of $K \times A \times N$ under matrix multiplication.

Immediate group-theoretical consequences

With the full Iwasawa decomposition in mind, we can scratch the surface of the rather complicated $SL(2,\mathbb{R})$.

The only continuous homomorphism of $SL(2,\mathbb{R})$ to $\mathbb{R}$ is trivial.

Proof. Let $f:SL(2,\mathbb{R}) \to \mathbb{R}$ be such a map. We have $f(kan)=f(k)+f(a)+f(n)$. We need to show that $f(k)=f(a)=f(n)=0$.

First of all, since $K$ is a compact subgroup of $SL(2,\mathbb{R})$, its image on $\mathbb{R}$ has to be a compact subgroup. On the other hand, $f$ on $A$ and $N$ can be constructed more explicitly. For $A$, we see $\begin{pmatrix}r & 0 \\ 0 & \frac{1}{r} \end{pmatrix} \mapsto r \mapsto \log{r}$ yields an isomorphism of $A$ and $\mathbb{R}$, in both algebraical and topological sense. For $N$ on the other hand, we immediately have an isomorphism $\begin{pmatrix}1 & x \\ 0 & 1\end{pmatrix} \mapsto x$. Therefore the image of $f$ on $A$ and $N$ can be realised as as $u\log{r}$ and $vx$ for some $u,v \in \mathbb{R}$. We use the fact that $AN=NA$ to determine $u$ and $v$. Notice that

applying $f$ on both sides, we have

For $u$, we consider the conjugate relation

Applying $f$ on both sides we obtain

This proves the triviality of $f$. $\square$

Let $f:SL(2,\mathbb{R}) \to GL(n,\mathbb{R})$ be a continuous homomorphism, then $f(SL(2,\mathbb{R})) \subset SL(n,\mathbb{R})$.

Proof. Consider the sequence of group homomorphisms

Since $SL(2,\mathbb{R})$ is connected, we see $\det\circ f(SL(2,\mathbb{R}))$ is connected, thus lying in $\mathbb{R}_{>0}$. We can then modify the sequence a little bit:

The map $\log \circ \det \circ f$ is a continuous homomorphism sending $SL(2,\mathbb{R})$ to $\mathbb{R}$, which is trivial, and therefore

This proves our assertion. $\square$

There are still a lot we can do without much Lie theory but Haar measure theory. The reader is advised to try this exercise set to see, for example, that the “volume” of $SL(2,\mathbb{R})/SL(2,\mathbb{Z})$ is $\zeta(2)$. In the references / further reading section the reader will also find a way to show that $SL(2,\mathbb{Z})\backslash SL(2,\mathbb{R})/SO(2,\mathbb{R})$ has volume $\frac{\pi}{3}$.

References / Further Reading

🔲 ☆

Important Posts of This Blog

This post collects the top 5 most popular content according to Google Search Console.

irreducible representations of so(3)…

The group $SO(3)$ is one of the most “realistic” Lie groups, as it describes all 3D rotations in real world. In the post Irreducible Representations of SO(3) and the Laplacian, we compute all of its irreducible representations, using the theory of Laplacian and harmonic polynomials. This is indeed not an easy job, as it shows the “hard” side of linear algebra.

fourier transform of sinx/x…

The Fourier transforms of $\frac{\sin x}{x}$ and $\left(\frac{\sin{x}}{x}\right)^2$ are important but not easy to compute. In this post The Fourier transform of sinx/x and (sinx/x)^2 and more we did the computation by extensively using contour integration. Along the journey, we also review important concepts in complex analysis.

fréchet derivative…

Fréchet derivative generalises the concept of derivative into any topological vector spaces, the dimension being arbitrary. The most important thing is, derivative should oftentimes be understood as a linear operator instead of a number or a matrix, as is shown in the post A Brief Introduction to Fréchet Derivative.

fourier transform of e^-ax^2…

In this post The Fourier Transform of exp(-cx^2) and Its Convolution, we compute the Fourier transform of $\exp(-cx^2)$ using two ways, differential equation and Gaussian integral. We also find the convolution quite easy to be computed if we utilise Fourier transform.

🔲 ⭐

Artin's Theorem of Induced Characters

Introduction

When studying a linear space, when some subspaces are known, we are interested in the contribution of these subspaces, by studying their sum or (inner) direct sum if possible. This philosophy can be applied to many other fields.

In the context of representation theory, say, we are given a finite group $G$, with a subgroup $H$, we want to know how a character of $H$ is related to a character of $G$, through induction if anything. Next we state the content of this post more formally.

Let $G$ be a finite group with distinct irreducible characters $\chi_1,\dots,\chi_h$. A class function $f$ on $G$ is a character if and only if it is a linear combination of the $\chi_i$’s with non-negative integer coefficients. We denote the space of characters by $R^+(G)$. However, $R^+(G)$ lacks a satisfying algebraic structure, for example, one is not even allowed to freely do subtraction. For this reason, we extend the coefficients to all of integers, by defining

An element of $R(G)$ is called a virtual character because when one coefficient of some $\chi_i$ is negative, it cannot be a character in the usual sense. Note that $R(G)$ is a finitely generated free abelian group, hence we are free to do subtraction in the normal sense.

Besides, since the product of two characters are still a character, we see $R(G)$ is a ring (not necessarily commutative). To be precise, it is a subring of the ring $F_\mathbb{C}(G)$, the ring of class functions of $G$ over $\mathbb{C}$. Furthermore, we actually have $F_\mathbb{C}(G) \cong \mathbb{C} \otimes R(G)$.

Let $H$ be a subgroup of $G$. Then the operation of restriction and induction defines homomorphisms $\mathrm{Res}:R(G) \to R(H)$ and $\mathrm{Ind}:R(H) \to R(G)$. By extending the Frobenius reciprocity linearly, still we find that $\mathrm{Res}$ and $\mathrm{Ind}$ are adjoints of each other. We also notice that the image of $\mathrm{Ind}:R(H) \to R(G)$ is a right ideal of $R(G)$. This is because, for any $\varphi \in R(H)$ and $\psi \in R(G)$, one has

But being an ideal should not be the end of our story. We want to know what happens if we consider more than one subgroups. For example, since every group is the union of all of its cyclic groups, what if we consider all cyclic subgroups of $G$? We are also interested in how all these ideals work together. This is where Artin’s theorem comes in.

Artin’s Theorem - Statement and a Concrete Example

Artin’s Theorem. Let $X$ be a family of subgroups of a finite group $G$. Let $\mathrm{Ind}:\oplus_{H \in X}R(H) \to R(G)$ be the homomorphism defined by the family of $\mathrm{Ind}_H^G$, $H \in X$. Then the following statements are equivalent:

(i) $G$ is the union of the conjugates of all $H \in X$. Equivalently, for any $\sigma \in G$, there is some $H \in X$ such that $H$ contains a conjugate of $\sigma$.

(ii) The cokernel of $\mathrm{Ind}:\bigoplus_{H \in X}R(H) \to R(G)$ is finite.

Example. Put $G=D_4$, the dihedral group consists of rotations ($\sigma$) and flips ($\tau$) of the square. We write

In this example we take $X=\{\langle\sigma\rangle,\langle\tau\rangle,\langle\tau\sigma\rangle\}$. First of all we put down the character table of $G$:

The character table of elements of $X$ is not difficult to carry out as they are characters of cyclic groups.

Instead of writing something like $\mathrm{Ind}_{\langle\sigma\rangle}^{D_4}\chi_1^\sigma=\chi_1+\chi_4$ manually for all characters, we put all of them in an induction-restriction table:

which yields a matrix naturally:

How to read the induction-restriction table? For example, the first column is $\langle \mathrm{Ind}_{\langle\sigma\rangle}^{D_4}\chi_1^\sigma,\chi_j\rangle$. Since $\mathrm{Ind}_{\langle\sigma\rangle}^{D_4}\chi_1^\sigma=\chi_1+\chi_4$, the column becomes $(1,0,0,1,0)$. On the other hand, the rows are indicated by the inner product with restriction. For example, since we have $\mathrm{Res}_{\langle\sigma\rangle}^{D_4}\chi_5=1$, thus $\langle\chi_4^\sigma,\mathrm{Res}_{\langle\sigma\rangle}^{D_4}\chi_5\rangle=1$ and therefore $T_{54}=1$. Induction and restriction coexist up to a transpose, which is another way to illustrate Frobenius reciprocity.

We obtain the induction map explicitly:

where the basis of $R(D_4)$ is $\chi_1,\dots,\chi_5$ and the basis of $R(\langle\sigma\rangle) \oplus R(\langle\tau\rangle) \oplus R(\langle\tau\sigma\rangle)$ is given by the second row of the induction-restriction table. By doing Gaussian elimination of rows and columns of $T$ (over $\mathbb{Z}$), i.e. changing the basis for $\mathbb{Z}^5$ and $\mathbb{Z}^8$, the matrix $T$ is reduced to the form

The image of $U$ is $\mathbb{Z} \oplus \mathbb{Z} \oplus \mathbb{Z} \oplus \mathbb{Z} \oplus 2\mathbb{Z}$, hence the cokernel of the induction map is

which is certainly finite. One can also verify that $X$ satisfies (i).

Proof of Artin’s Theorem

(i) => (ii)

Consider the exact sequence

To show that $\mathrm{coker}(\mathrm{Ind})$ is finite (it is a finitely generated ring to begin with), it suffices to show that it suffices to see its result from tensoring with $\mathbb{Q}$, in other words, that

is a surjective map, i.e. it has trivial cokernel. This is equivalent to the surjectivity of the $\mathbb{C}$-linear map

By Frobenius reciprocity, this is on the other hand equivalent to the injectivity of

Notice that $\mathbb{C} \otimes R(G)$ is the space of class functions of $G$. For a class function $f$ of $G$, if its restriction on each $H$ is $0$, according to (i), all values of $f$ have been determined, therefore $f$ is $0$ everywhere.

(ii) => (i)

Let $S$ be the union of the conjugates of the subgroups $H \in X$. Then we write elements in $\oplus_{H \in X}R(H)$ as $g=\sum_{H \in X}\mathrm{Ind}_H^G(f_H)$. It follows that $g$ always vanishes on $G \setminus S$. If (ii) holds, then

is a surjective map. Therefore class functions of $G$, i.e. elements of $\mathbb{C} \otimes R(G)$ vanish on $G \setminus S$, which forces $G \setminus S$ to be empty, i.e. $G=S$.

References / Further Reading

🔲 ⭐

Projective Representations of SO(3)

Introduction

In another post we gave an exposition of irreducible representations of $SO(3)$, where we find ourselves studying harmonic polynomials on a sphere. In this post, we study another category of representations of $SO(3)$ that have its own significance in physics: projective representation. The result will be written as direct sums of irreducible representations of $SU(2)$ so the reader is advised to review the corresponding post. We recall that

Every irreducible unitary irreducible representation of $SU(2)$ is of the form $V_n$, where

Representation theory has a billion applications in physics. The group $SO(3)$ acts as the group of orientation-preserving orthogonal symmetries in $\mathbb{R}^3$ in an obvious way. The invariance under this action justifies the principle that physical reactions such as those between elementary particles should not depend on the observer’s vantage point.

Nevertheless, applications of representation theory in physics do not end at finite dimensional vector spaces. Put infinite dimensional vector spaces aside, we sometimes also need a class of vectors, in lieu of a single vector. For example, given a wavefunction $\psi$, we know $|\psi|^2$ has an interpretation of probability density. But then for any $\lambda \in S^1$, we see $|\lambda\psi|^2=|\psi|^2$, therefore $\lambda\psi$ and $\psi$ should be equivalent in a sense. By considering these equivalent classes, we find ourselves considering the projective space. Hence it makes sense to consider projective representations

where $G$ is compact. In this post we will assume $G=SO(3)$ and see how far we can go.

Simplification of Arguments

We begin with a simple group-theoretic lemma:

Lemma 1. One has

where $C_n$ is the group of $n$th roots of unity, embedded into $SL(n,\mathbb{C})$ via the map $\xi \to \xi I$.

Proof. Consider the canonical map

This map is surjective. For any $B\mathbb{C}^\ast \in GL(n,\mathbb{C})/\mathbb{C}$, we have $B\mathbb{C}^\ast=\frac{1}{|B|}B\mathbb{C}^\ast$, and $\frac{1}{|B|}B \in SL(n,\mathbb{C})$ is the preimage of $B\mathbb{C}^\ast$.

On the other hand, we see $\ker p$ consists of scalar matrices in $SL(n,\mathbb{C})$. If $\lambda I \in SL(n,\mathbb{C})$, then $|\lambda I|=\lambda^n=1$, thereby $\ker p$ can be identified as $C_n$, proving the isomorphism. $\square$

Therefore, when studying a projective representation $G \to PGL(n,\mathbb{C})$, we are quickly reduced to special linear group, which is much simpler. Besides the group of $n$th roots of unity is much simpler than the group of nonzero complex numbers.

However, our simplification has not reach the end. We will see next that special linear group can be then reduced to special unitary group. Recall that a linear matrix representation of a compact Lie group is similar to a unitary one. The following lemma is a projective analogy.

Lemma 2. Let $G$ be a compact Lie group. Every homomorphism $\varphi:G \to PGL(n,\mathbb{C})=SL(n,\mathbb{C})/C_n$ is conjugate to a homomorphism whose image lies in $SU(n)/C_n$.

Proof. Consider the fibre product $H$ of $G$ and $SL(n,\mathbb{C})$ over $PGL(n,\mathbb{C})$:

Here, $p$ is the canonical projection of $SL(n,\mathbb{C}) \to SL(n,\mathbb{C})/C_n$. It suffices to show that $\tilde\varphi$ is similar to a unitary representation. Explicitly, one has

with $\tilde\varphi:(g,A) \mapsto A$ and $\tilde{p}:(g,A) \to g$. Since $G$ is compact and $\tilde{p}$ has finite kernel $C_n$, one sees that $H$ is a compact Lie group. Therefore the matrix representation $\tilde\varphi:H \to SL(n,\mathbb{C})$ is similar to a homomorphism $H \to SU(n)$, from which the lemma follows. $\square$

Therefore we are reduced to considering homomorphisms

for sake of this post. But we are not done yet. Having to deal with a quotient group is not satisfactory anyway.

Since $SU(n)$ is simply connected (see this video), the projections $SU(n) \to SU(n)/C_n$ are universal coverings. In particular, when $n=2$, we see $SU(2) \to SU(2)/C_2 = SO(3)$ is our well-known universal covering. If we lift $\varphi$ to universal coverings, we see ourselves dealing with $SU(2) \to SU(n)$. To be precise, we have the following commutative diagram (universal cover is a functor):

Dealing with $\tilde\varphi$ is much simpler. Physicists are more interested in unitary representations of the quaternion group $SU(2) = \operatorname{Spin}(3)$ rather than $SO(3)$, even though it looks more natural.

Discovering Projective Representations

Now we are interested in finding all unitary representations that can be pushed down to a projective representation of $SO(3)$. We have two questions:

Question 1. Does it suffice to consider maps of the form $\tilde\varphi:SU(2)\to SU(n)$?

The answer is yes. Notice that every homomorphism $f:SU(2) \to U(1)$ has to be trivial. If not, then $\ker f$ should be a nontrivial proper normal subgroup of $SU(2)$, i.e. it has to be $C_2$. But $SU(2)/C_2 \cong SO(3)$. A contradiction.

Also recall the exact sequence

Let $g:SU(2) \to U(n)$ be any homomorphism, and consider the canonical projection $\pi:U(n) \to \frac{U(n)}{SU(n)}=U(1)$. We see $\pi \circ g$ sends any elements in $SU(2)$ to $1$, meaning the image of $SU(2)$ in $U(n)$ must bee in $SU(n)$. Therefore, by considering maps of the form $SU(2) \to SU(n)$, we are not missing anything. $\square$

Question 2. What should be considered in order to determine whether $\tilde\varphi:SU(2) \to SU(n)$ can be pushed down into a morphism $\varphi:SO(3) \to SU(n)/C_n$?

The answer is, one should consider the element $-I$. Let $p:SU(2) \to SO(3)$ be the universal covering, and let $p_n:SU(n) \to SU(n)/C_n$ be the corresponding universal covering. For $\tilde\varphi:SU(2) \to SU(n)$, we want to know when there will be a homomorphism $\varphi:SO(3) \to SU(n)/C_n$ such that $p_n \circ \tilde\varphi = \varphi \circ p$.

Notice that $p(-I)=I$, therefore, should $\varphi$ exist, one has $p_n \circ \tilde\varphi(-I)=e$, the identity in the group $SU(n)/C_n$, because one should have $\varphi(I)=e$. Hence $\tilde\varphi(-I) \in \ker p_n$. Therefore $\varphi(-I)$ can be identified as a $n$th root of unity. Since $\tilde\varphi(-I)\tilde\varphi(-I)=\tilde\varphi(I)$, we see $\tilde\varphi(-I)$ should also be identified as a square root of $1$. That is, $\tilde\varphi(-I)$ is either $\operatorname{id}$ or $-\operatorname{id}$. We discuss these two cases in the following question.

On the other hand, if $\tilde\varphi(-I)=\pm\operatorname{id}$, then one can verify that $p_n \circ \tilde\varphi \circ p^{-1}$ can be well-defined. Therefore $\tilde\varphi$ can be pushed down into a morphism of $SO(3)$ if and only if $\tilde\varphi(-I)=\pm\operatorname{id}$. $\square$

Question 3. Let $W=\bigoplus_n k_n V_n$ be a representation of $SU(2)$. What will happen if it can be pushed down to a projective representation of $SO(3)$?

Let $\tilde\varphi:SU(2) \to SU(n)$ be the homomorphism corresponding to $W$. We have known for certain that when $\tilde\varphi$ can be pushed down to $SO(3)$ if and only if $\tilde\varphi(-I)=\pm\operatorname{id}$.

If $\tilde\varphi(-I)=\operatorname{id}$, then all the $n$ have to be even because the action on the polynomials cannot be the identity when $n$ is odd. If $\tilde\varphi(-I)=-\operatorname{id}$, then all the $n$ have to be odd because when $n$ is even the action of $-I$ on the polynomials must be the identity.

To be more explicit, $W=\bigoplus_n k_{2n}V_{2n}$ or $W=\bigoplus_{n}k_{2n+1}V_{2n+1}$.

Theorem 1. The projective representations of $SO(3)$ are given up to conjugations of $SU(2)$ of the form

depending on whether $(-I)$ acts by $\operatorname{id}$ or $-\operatorname{id}$.

In brief, when thinking about projective representations of $SO(3)$, one thinks about polynomials in two variables whose terms are either all even or all odd.

When studying $\tilde\varphi:SU(2) \to SU(n)$, we see $\tilde\varphi(-I)$ can be identified both as a $n$th root of unity and a square root of unity. When $n$ is odd however, we see $\tilde\varphi(-I)$ cannot be identified as $-1$, i.e. $-I$ cannot act as $-\operatorname{id}$. Unexpectedly, number theory plays a small role here.

🔲 ⭐

The Quadratic Reciprocity Law

Introduction

Historically, thanks to Gauss, the quadratic reciprocity law marked the beginning of algebraic number theory. Therefore it it deserves a good dose of attention. However, whacking the definition to the beginner would not work pretty well.

We consider the equation

one of the simplest non-trivial multi-variable Diophantine equations that can be imagined. Trying to violently search all solutions without any precaution is not wise. Therefore we consider reductions first. In order that $x^2+by=a$ has a solution, it is necessary that

Then the Chinese remainder theorem inspires us to first look into the case when $b$ is a prime. The case when $b=2$ is excluded because we are only allowed to study whether $x$ is odd or even.

Therefore we study the equation $x^2=a$ in the finite field of order $p$ where $p \ne 2$. We give a very straightforward characterisation, which is seemingly stupid. For $a \in \mathbf{F}_p^\ast$, define

It is also convenient to define $\left(\frac{0}{p}\right)=0$.

This post will start with an equivalent form that is easier to compute (although less intuitive). Then we will demonstrate how to do basic computation of it, and finally we try to view it in a view of algebraic number theory.

Elementary Observations

Basic Computation

We begin with a simplified formula for the Legendre symbol.

Proposition 1. $\left(\frac{a}{p}\right) = a^\frac{p-1}{2}$ for $a \in \mathbf{F}_p^\ast$.

N.B. The power on the right hand side is taken in the corresponding finite field. For example, $\left(\frac{2}{3}\right)=2=-1$ in $\mathbf{F}_3$. By abuse of language, we identify integers $1$ and $-1$ with its canonical images in the finite field.

Proof. Notice that $\left(\frac{a}{p}\right)=1$ if and only if $a \in \mathbf{F}_p^{\ast 2}$. The rest comes from the following lemma which deserves to be stated separately in a more general literature. $\square$

Lemma 1. Let $p$ be a prime (it can be $2$ this time) and $K$ a finite field of order $q=p^n$ for some $n>0$. Then

  1. If $p=2$, then all elements of $K$ are squares.

  2. If $p \ne 2$, then the squares $K^{\ast 2}$ of $K^\ast$ form a subgroup of index $2$ in $K^*$; it is the kernel of the map $p:x \mapsto x^{(q-1)/2}$ from $K^\ast $ to $\{-1,1\}$.

    To be precise, one has an exact sequence of cyclic groups:

Proof. The first case is a restatement on the condition of Frobenius endomorphism being an automorphism (see nlab). For the second case, let $\overline{K}$ be an algebraic closure of $K$. If $x \in K^\ast$, let $y \in \overline{K}$ be a square root of $x$, i.e. such that $y^2=x$. We have

Since $x \in K^{\ast 2}$ if and only if $y \in K^\ast$, which is equivalent to $y^{q-1}=p(x)=1$, one has $\ker p = K^{\ast 2}$. The rest follows from elementary calculation. $\square$

You need to recall or study basic structures of finite fields. For example, a finite field is always of prime power order. All finite fields of order $p^n$ are isomorphic, uniquely determined as a subfield of an algebraic closure of $\mathbf{F}_p$, being the splitting field of the polynomial $X^{p^n}-X$. Besides, the multiplicative group of a finite field is cyclic.

From proposition 1 it follows that

Corollary 1. For any prime number $p \ne 2$,

  1. The Legendre symbol is multiplicative, i.e. $\left(\frac{ab}{p}\right)=\left(\frac{a}{p}\right)\left(\frac{b}{p}\right)$.
  2. $\left(\frac{1}{p}\right)=1$
  3. $\left(\frac{-1}{p}\right)=(-1)^{\varepsilon(p)}$ where $\varepsilon(p)=\frac{p-1}{2} \pmod{2}$.

The harder thing to compute is the Legendre symbol when $a=2$.

Proposition 2. One has $\left(\frac{2}{p}\right)=(-1)^{\omega(p)}$ where $\omega(p)=\frac{p^2-1}{8}\pmod{2}$.

We want to find a square root of $2$, i.e. an element $y$ satisfying $y^2=2$ so that computing $2^{(p-1)/2}$ becomes computing $y^{p-1}$. This is not a easy job, and we do not expect to find it inside the field. For example, $\left(\frac{2}{3}\right)=2=-1$ and $\left(\frac{2}{5}\right)=4=-1$, meaning there is not such a $y$ in $\mathbf{F}_3$ and $\mathbf{F}_5$. However, there is an easy way to generate a $2$. Consider $y=\alpha+\alpha^{-1}$, then $y^2=2+\alpha^2+\alpha^{-2}$. If we have $\alpha^2+\alpha^{-2}=0$ then we are done. To find such an $\alpha$, notice that $\alpha^2+\alpha^{-2}=0$ implies that $\alpha^4+1=0$. Therefore $\alpha^8=1$. It suffices to use a primitive $8$th root of unity.

Proof. Let $\alpha$ be a primitive $8$th root of unity in a algebraic closure $\Omega$ of $\mathbf{F}_p$. Then $y=\alpha+\alpha^{-1}$ verifies $y^2=2$. Since $\Omega$ has characteristic $p$, we have

Observe that if $p \equiv 1 \pmod{8}$, then $y^p=\alpha+\alpha^{-1}=y$ (we used the fact that $\alpha$ is an $8$th root of unity). Therefore $y^{p-1}=\left(\frac{2}{p}\right)=1$. This inspires us to determine $y^{p-1}$ through the relation between $p$ and $8$. As $p$ is odd, there are four possibilities: $p\equiv 1,3,5,7 \pmod{8}$.

If $p \equiv 7 \pmod{8}$, i.e. $p \equiv -1 \pmod{8}$, we still have $y^p=\alpha^{-1}+\alpha=y$. Therefore $\left(\frac{2}{p}\right)=1$ whenever $p \equiv \pm 1 \pmod{8}$. This discovery inspires us to study $p \equiv \pm 5 \pmod{8}$ together. When this is the case, one finds $y^p=\alpha^5+\alpha^{-5}$. Since $\alpha^4=\alpha^{-4}=-1$ (the primitivity of $\alpha$ matters here), $y^p$ becomes $-(\alpha+\alpha^{-1})=-y$. Cancelling $y$ on both sides, we obtain $y^{p-1}=\left(\frac{2}{p}\right)=-1$. To conclude,

It remains to justify the $\omega$ function as above. We need to find a function $\omega(n)$ such that $\omega(n) \equiv 0 \pmod 2$ when $p \equiv \pm 1 \pmod 8$ and $\omega(n) \equiv 1 \pmod 2$ when $p \equiv \pm 5 \pmod 8$. If we square $p$, we can ignore the difference of the signs:

Therefore, whether $(p^2-1)/8$ is odd or even is completely determined by the remainder of $p$ modulo $8$. We therefore put $\omega(p)=(p^2-1)/8$ and this concludes our proof. $\square$

To conclude in simpler form, we have

  • $1$ is always a square root in a finite field.
  • $-1$ is a square root in $\mathbf{F}_p$ if and only if $\frac{p-1}{2}$ is even, i.e., $p \equiv 1 \pmod{4}$.
  • $2$ is a square root in $\mathbf{F}_p$ if and only if $\frac{p^2-1}{8}$ is even, i.e. $p \equiv \pm 1 \pmod{8}$.

Gauss’s Quadratic Reciprocity Law

The Legendre symbol says a lot of things, but you do not want to compute, for example $\left(\frac{37}{53}\right)$ by hand in the basic way as above. However, granted the following law, things are much easier.

Proposition 3 (Gauss’s Quadratic Reciprocity Law). For two distinct odd prime numbers $p$ and $\ell$, the following identity holds:

Alternatively,

Instead of computing $37^{(53-1)/2}$ modulo $53$, we obtain the value of $\left(\frac{37}{53}\right)$ in a much easier way:

In other words, there exist a solution of the equation $x^2+53y=37$.

The proof is carried out by Gauss sum. The proof looks contrived, but one can see a lot of important tricks. We will use corollary 1.1 frequently.

Proof. Again, let $\Omega$ be an algebraic closure of $\mathbf{F}_p$, and let $\omega \in \Omega$ be a primitive $\ell$-th root of unity. If $x \in \mathbf{F}_\ell$, then $\omega^x$ is well-defined. Thus it is legitimate to write the “Gauss sum”:

Following the inspiration of what we have done in proposition 2, we study $y^2$ and $y^{p-1}$ again. The second one is quick.

Claim 1. $y^{p-1}=\left(\frac{p}{\ell}\right)$.

To show claim $1$, we notice that, as $\Omega$ is of characteristic $p$, we have

and therefore

Claim 2. $y^2 = \left(\frac{-1}{\ell}\right)\ell$ (by abuse of language, $\ell$ (the one outside the Legendre symbol) is used to denote the image of $\ell$ in the field $\mathbf{F}_p$.)

Notice that

Terms where $t=0$ are ignored safely. Then we notice that

For this reason we put

It follows that

It remains to compute the coefficients $C_u$. We see

When $u \ne 0$, the term $s=1-ut^{-1}$ runs over all of $\mathbf{F}_\ell$ except $1$. Therefore

since $[\mathbf{F}_\ell:\mathbf{F}_\ell^2]=2$ (read: exactly half of the elements of $\mathbf{F}_\ell$ are squares, the rest are not). Therefore

Recall that $1-\omega^\ell=(1-\omega)(1+\omega+\dots+\omega^{\ell-1})=0$. As $\omega$ is a primitive root, we see $\omega \ne 1$ and therefore $1+\omega+\dots+\omega^{-\ell-1}=0$. The result follows.

Finally, the reciprocity follows because

We invite the reader to expand the identity above using corollary 1 and see the result. $\square$

Observation from Algebraic Number Theory

In this section we introduce some observation from a point of view of algebraic number theory without complete proofs.

Let $p$ be an odd prime, and $\zeta_p$ a primitive $p$-th root of unity. We have seen that the Gauss’s sum

satisfies the relation

Therefore the field $\mathbb{Q}(\sqrt{p})$ is contained in $\mathbb{Q}(\zeta_p)$ or $\mathbb{Q}(\zeta_p,i)$, depending on the sign of $\left(\frac{-1}{p}\right)$. The first one is a cyclotomic extension of $\mathbb{Q}$ by definition. The second one is not, but is a finite abelian extension of $\mathbb{Q}$. However, every finite abelian extension of $\mathbb{Q}$ is a subfield of a cyclotomic field. See this note. To conclude,

Every field of the form $\mathbb{Q}(\sqrt{p})$ lies in a subfield of $\mathbb{Q}(\zeta_m)$ for some $m>1$.

Solving the equation $x^2 \equiv a \pmod p$ also inspires us to look at the quadratic field $K=\mathbb{Q}(\sqrt{a})$. For simplicity we assume that $a$ is square free. If $\left(\frac{a}{p}\right)=1$, then there exists $\alpha \in \mathbb{Z}$ such that

This equation is interesting because on the left hand side we actually have the minimal polynomial of $K$, namely $p(x)=x^2-a$. The equation split completely modulo $p$. The relation above actually signifies that there exists prime ideals $\mathfrak{P}_1,\mathfrak{P}_2\subset \mathfrak{o}_k$ such that

where the ramification indices $e_1=e_2=1$. This says the prime ideal $(p)$ is totally split in $\mathbb{Q}(\sqrt{a})$. Conversely, if $(p)$ is totally split in $\mathbb{Q}(\sqrt{a})$ (where $(a,p)=1$ for sure), then $\left(\frac{a}{p}\right)=1$. To conclude,

The Legendre symbol $\left(\frac{a}{p}\right)=1$ if and only if $(p)$ totally splits in $\mathbb{Q}(\sqrt{a})$.

In fact, one can have a more profound observation of number fields which will imply the quadratic reciprocity law:

Let $\ell$ and $p$ be two distinct odd primes, $S_\ell=\left(\frac{-1}{\ell}\right)\ell$, then $(p)$ is totally split in $\mathbb{Q}(S_\ell)$ if and only if $(p)$ splits totally into two an even number of prime ideals in $\mathbb{Q}(\zeta_\ell)$.

Besides you may want to know about Artin’s reciprocity which generalised Gauss’s reciprocity, but that’s quite advanced topic (class field theory). This also shows the significance of quadratic reciprocity law.

References

  • Jean-Pierre Serre, A Course in Arithmetic
  • Jürgen Neukirch, Algebraic Number Theory
  • Serge Lang, Algebraic Number Theory
🔲 ☆

The Pontryagin Dual group of Q_p

Introduction

Let $G$ be a locally compact abelian group (for example, $\mathbb{R}$, $\mathbb{Z}$, $\mathbb{T}$, $\mathbb{Q}_p$). Then every irreducible unitary representation $\pi:G \to U(\mathcal{H}_\pi)$ is one dimensional, where $\mathcal{H}_\pi$ is a non-zero Hilbert space, in which case we take it as $\mathbb{C}$. It follows that $\pi(x)(z)=\xi(x)z$ for all $z \in \mathbb{C}$ where $\xi \in \operatorname{Hom}(G,\mathbb{T})$, viewing $\mathbb{T}$ as the unit circle in the complex plane. Such homomorphisms are called (unitary) characters, and we denote all characters of $G$ by $\widehat{G}$, calling it the Pontryagin dual group. It should ring a bell of the representation theory of finite groups. For convenience, instead of $\xi(x)$, we often write $\langle x,\xi \rangle$. We also write $\langle x,\xi\rangle\langle y,\xi \rangle=\langle x+y ,\xi\rangle$.

Some easily accessible examples are:

  • $\widehat{\mathbb{R}} \cong \mathbb{R}$, with $\langle x,\xi \rangle = e^{2\pi i \xi x}$.
  • $\widehat{\mathbb{T}} \cong \mathbb{Z}$, with $\langle z, n \rangle = z^n$.
  • $\widehat{\mathbb Z} \cong \mathbb{T}$, with $\langle n,z \rangle = z^n$.
  • $\widehat{\mathbb{Z}/k\mathbb{Z}} \cong \mathbb{Z}/k\mathbb{Z}$, with $\langle m,n\rangle =e^{2\pi i m n / k}$.

The Dual of p-adic Field

But we want to show that

The proof is broken down into several steps. But it shall be clear that $\mathbb{Q}_p$ is a topological group with respect to addition.

Step 1 - Find the simplest character

Every $p$-adic number $x \in \mathbb{Q}_p$ can be written in the form

where $m \in \mathbb{Z}$, $x_j \in \{1,2,\dots,p-1\}$ for all $j$. We define

and claim that $\xi_1$ is a character. Notice that the right hand side is always well-defined, because all summands when $j \ge 0$ contributes nothing as $\exp(2\pi i x_jp^j)=1$. That is to say, the right hand side can be understood as a finite product: when $m \ge 0$, i.e. $x \in \mathbb{Z}_p$, the pairing $\langle x, \xi \rangle = 1$; when $m<0$ however, $\langle x,\xi_1 \rangle = \exp\left( 2\pi i \sum_{j=m}^{-1}x_jp^j\right)$. Therefore it is legitimate to write

From this it follows immediately that

The function $\xi_1$ is continuous because it is continuous on $\mathbb{Z}_p$, being constant. Therefore it is safe to say that $\xi_1$ is a character with kernel $\mathbb{Z}_p$.

A quick thought would be, generating all characters out of $\xi_1$, to get something like $\xi_p$, $\xi_{1+p+p^2+\dots}$. But that approach might lead us to a nightmare of subscripts. Instead, we try to discover as many characters as possible. For any $y \in \mathbb{Q}_p$, we define

In other words, $\xi_y$ is defined by $x \mapsto \langle xy,\xi_1\rangle$. Since the multiplication is continuous, we see immediately that $\xi_y$ is a character, not very more complicated than $\xi_1$. We will show that this is all we need. To do this, we need to characterise all characters. Notice that have the same image but their kernels differ. So we try to characterise the characters by characterising their kernels.

Step 2 - Study the kernels of characters

For the $\xi_y$ above, we notice that $\langle x,\xi_y\rangle=1$ if and only if $xy \in \ker\xi_1=\mathbb{Z}_p$, i.e. $|xy|_p \le 1$. Therefore

We expect that all characters are of the form $\xi_y$. Therefore their kernels shall be like $\ker\xi_y$ naturally. Notice that for fixed $y$, we have $|y|_p=p^m$ for some $m \in \mathbb{Z}$. As a result $\ker\xi_y = \overline{B}(0,p^{-m})$. For this reason we have the following (more obscure) argument

Lemma 1. If $\xi \in \widehat{\mathbb{Q}}_p$, there exists an integer $k$ such that $\overline{B}(0,p^{-k}) \subset \ker\xi$.

Proof. Since $\xi$ is continuous, $\langle 0,\xi\rangle=1$ on the circle, there exists $k$ such that $\overline{B}(0,p^{-k}) \subset \xi^{-1}\{z \in \mathbb{T}:|z-1| < 1\}$ (this is to say the right hand side is an open set). But $\overline{B}(0,p^{-k})$ is a group (as $|\cdot|_p$ is non-Archimedean), therefore it maps into a subgroup of $\mathbb{T}$, which can only be $\{1\}$. $\square$

We cannot say the kernel of $\xi$ is exactly of the form $\overline{B}(0,p^{-k})$ yet, but we have a way to formalise them now. If $\overline{B}(0,p^{-k}) \subset \ker\xi$ for all $k$, then $\xi=1$ is the unit in $\widehat{\mathbb{Q}}_p$. Otherwise, for each $\xi$, there is a smallest $k_0$ such that $\overline{B}(0,p^{-k_0})\subset \ker\xi$ but $\overline{B}(0,p^{-k}) \not \subset \ker\xi$ whenever $k<k_0$. In another way around, we have $\langle p^{k_0-1},\xi\rangle \ne1$ but $\langle p^k,\xi\rangle=1$ whenever $k \ge k_0$. As one may guess, such $k_0$ subjects to the “size” of $\xi$. For convenience we study the case when $k_0=0$ first.

Lemma 2 (“Fourier series”). Suppose for given $\xi \in \widehat{\mathbb{Q}}_p$, $\langle 1,\xi \rangle = 1$ but $\langle p^{-1},\xi \rangle \ne 1$. There is a sequence $(c_j)$ taking values in $\{0,1,\dots,p-1\}$ such that $\langle p^{-k},\xi \rangle=\exp\left(2\pi i\sum_1^k c_{k-j}p^{-j}\right)$ for all $k=1,2,\dots$. In particular, $c_0 \ne 0$.

Proof. Put $\omega_k=\langle p^{-k},\xi\rangle$. Then $\omega_0=1$ but $\omega_k \ne 1$ for all $k \ge 1$. Since

each $\omega_{k+1}$ is a $p$-th root of $\omega_{k}$, and in particular $\omega_1$ is a $p$-th root of unity. There exists $c_0 \in \{1,\dots,p-1\}$ such that

and the overall formula for $\omega_k$ follows from induction. $\square$

One would guess that for the corresponding $k_0$, the “size” of $\xi$ should be $p^{k_0}$. This looks realistic, but will be tedious. Right now we still only study the case when $k_0=0$.

Lemma 3. Notation being in lemma 2, there exists $y \in \mathbb{Q}_p$ with $|y|_p=1$ such that $\xi = \xi_y$.

Proof. From lemma 2 we obtain a series $y=\sum_{j=0}^{\infty}c_jp^j$ with $c_0 \ne 0$. Then in particular $|y|_p=1$. By expanding the term, we see

It follows that $\langle x,\xi \rangle = \langle x,\xi_y \rangle$ for all $x \in \mathbb{Q}_p$. $\square$

Now we are ready to conclude our observation of the dual group.

Step 3 - Realise the dual group

Theorem. The map $\Lambda:y \mapsto \xi_y$ is an isomorphism of topological groups. Hence $\mathbb{Q}_p \cong \widehat{\mathbb{Q}}_p$.

Proof. First of all we study the algebraic isomorphism. First of all if $\xi_y=1$, then

Hence the map $\Lambda$ is injective. To show that $\Lambda$ is surjective, fix $\xi \in \widehat{\mathbb{Q}}_p$. By the comment below lemma 1, there is a smallest integer $k$ such that $\langle p^k,\xi \rangle = 1$. Then one considers the character $\eta$ defined by

It satisfies the condition in lemma 3, therefore there exists $z \in \mathbb{Q}_p$ such that $\eta=\xi_z$, and it follows that $\xi=\xi_{p^{-k}z}$.

Next we show that $\Lambda$ is a homeomorphism. Observe the following sets

ranging over $\ell \ge 1$ and $k \in \mathbb{Z}$. These sets constitute a local base at $1$ for $\widehat{\mathbb{Q}}_p$. We need to show that it corresponds to a local base of $\mathbb{Q}_p$ under the map $\Lambda$:

The image of the set $\{x:|x|_p \le p^k\}$ under $\xi_1$ is $\{1\}$ if $k \le 0$ and is the group of $p^k$-th roots of unity if $k>0$, and hence is contained in $\{z:|z-1|<\ell^{-1}\}$ if and only if $k \le 0$. It follows that $\xi_y \in N(\ell,k)$ if and only if $|y|_p \le p^{-k}$, i.e., $y \in \overline{B}(0,p^{-k})$. We are done. $\square$

🔲 ⭐

The abc Theorem of Polynomials

Let $K$ be an algebraically closed field of characteristic $0$. Instead of studying the polynomial ring $K[X]$ as a whole, we pay a little more attention to each polynomial. A reasonable thing to do is to count the number of distinct zeros. We define

For example, If $f(X)=(X-1)^{100}$, we have $n_0(f)=1$. It seems we are diving into calculus but actually there is still a lot of algebra.

The abc of Polynomials

Theorem 1 (Mason-Stothers). Let $a(X),b(X),c(X) \in K[X]$ be polynomials such that $(a,b,c)=1$ and $a+b=c$. Then

Proof. Putting $f=a/c$ and $g=b/c$, we have

This implies


We interrupt the proof here for some good reasons. Rational functions of the form $f’/f$ remind us of the chain rule applied to $\log{x}$. In the context of calculus, we have $\left(\log{f(x)}\right)’=f’/f$. On the ring $K[x]$, we define $D:K[x] \to K[x]$ to be the formal derivative morphism. Then this endomorphism extends to $K(x)$ by

On $K(x)^\ast$ (read: the multiplicative group of the rational function field $K(x)$), we define the logarithm derivative

It follows that

Also observe that, just as in calculus, if $f$ is a constant function, then $D(f)=0$. Now we write

Then it follows that

Now we can be back to the proof.


Proof (continued). Since $K$ is algebraically closed,

We see, for example

Therefore

Likewise

Combining both, we obtain

Next, multiplying $f’/f$ and $g’/g$ by

which has degree $n_0(abc)$ (since $(a,b,c)=1$, these three polynomials share no root). Both $N_0f’/f$ and $N_0g’/g$ are polynomials of degrees at most $n_0(abc)-1$ (this is because $\deg h’=\deg h-1$ for non-constant $h \in K[X]$, while $f$ and $g$ are non-constant (why?); we assume $\operatorname{char} K=0$ for this reason).

Next we observe the degrees of $a,b$ and $c$. Since $a+b=c$, we actually have $\deg c \le \max\{\deg a,\deg b\}$. Therefore $\max\{\deg a,\deg b,\deg c\}=\max\{\deg a,\deg b\}$. From the relation

and the assumption that $(a,b)=1$, one can find polynomial $h \in K[X]$ such that

Taking the degrees of both sides, we see

This proves the theorem. $\square$

Applications

We present some applications of this theorem.

Corollary 1 (Fermat’s theorem for polynomials). Let $a(X),b(X)$ and $c(X)$ be relatively prime polynomials in $K[X]$ such that not all of them are constant, and such that

Then $n \le 2$.

Alternatively one can argue the curve $x^n+y^n=1$ on $K(X)$.

Proof. Since $a,b$ and $c$ are relatively prime, we also have $a^n$, $b^n$ and $c^n$ to be relatively prime. By Mason-Stothers theorem,

Replacing $a$ by $b$ and $c$, we see

It follows that

In this case $n<3$. $\square$

Corollary 2 (Davenport’s inequality). Let $f,g \in K[X]$ be non-constant polynomials such that $f^3-g^2 \ne 0$. Then

One may discuss cases separately on whether $f$ and $g$ are coprime, and try to apply Mason-Stothers theorem respectively, and many documents only record the proof of coprime case, which is a shame. The case when $f$ and $g$ are not coprime can be a nightmare. Instead, for sake of accessibility, we offer the elegant proof given by Stothers, starting with a lemma about the degree of the difference of two polynomials.

Lemma 1. Suppose $p,q \in K[X]$ are two distinct non-constant polynomials, then

Proof. Let $k(f)$ be the leading coefficient of a polynomial $f$. If $\deg p \ne \deg q$ or $k(p) \ne k(q)$, then $\deg(p-q)\ge \deg p \ge \deg p - n_0(p)-n_0(q)+1$ because $n_0(p) \ge 1$ and $n_0(q) \ge 1$.

Next suppose $\deg p = \deg q$ and $k(p)=k(q)$. If $(p,q)=1$, then by Mason-Stothers,

Otherwise, suppose $(p,q)=r$. Then $p/r$ and $q/r$ are coprime. Again by Mason-Stothers,

Therefore

On the other hand,

Combining all these inequalities, we obtain what we want. $\square$


Proof (of corollary 2). Put $\deg{f}=m$ and $\deg{g}=n$. If $3m \ne 2n$, then

because $m \ge 1$. Next we assume that $3m=2n$, or in other word, $m=2r$ and $n=3r$. By lemma 1, we can write

This proves the inequality. $\square$

One may also generalise the case to $f^m-g^n$. But we put down some more important remarks. First of all, Mason-Stothers is originally a generalisation of Davenport’s inequality (by Stothers). I personally do not think any mortal can find the original paper of Davenport’s inequality, but on [Shioda 04] there is a reproduced proof using linear algebra (lemma 3.1).

For more geometrical interpretation, one may be interested in [Zannier 95], where Riemann’s existence theorem is also discussed.

In Stothers’s paper [Stothers 81], the author discussed the condition where the equality holds. If you look carefully you will realise his theorem 1.1 is exactly the Mason-Stothers theorem.

References / Further Reading

🔲 ☆

Calculus on Fields - Heights of Polynomials, Mahler's Measure and Northcott's Theorem

Heights

Definition. For a polynomial with coefficients in a number field $K$

the height of $f$ is defined to be

where

is the Gauss norm for any place $v$.

Here, $M_K$ refers to the canonical set of non-equivalent places on $K$. See first four pages of this document for a reference.

As one can expect, this can tell us about some complexity of a polynomial, just like how the height of an algebraic number tells us its complexity. Let us compute some examples.

Computing Heights

Let us consider the simplest one

first. Since $|x^2-1|_v=1$ for all places $v$, the height of $f$ is a sum of $0$, which is still $0$.

Next, we take care of a polynomial that involves prime numbers

We see $|g(x)|_\infty=2$, $|g(x)|_2=2^{-(-2)}=4$, $|g(x)|_3=3^{-(-1)}=3$, and the Gauss norm is $1$ for all other primes. Therefore

Put $u(x,y)=\sqrt{2}x^2 + 3\sqrt{2}xy+5y^2+7 \in \mathbb{Q}(\sqrt{2})[x,y]$, we can compute its height carefully. Notice that $|\sqrt{2}|_v=\sqrt{|2|_v}$ for all places $v$ and we therefore have

Height and Products

If $f \in K[s_1,\dots,s_n]$ and $g \in K[t_1,\dots,t_m]$ are two polynomials in different variables, then as a polynomial in $K[s_1,\dots,s_n;t_1,\dots,t_m]$, $fg$ has height $h(f)+h(g)$. This is immediately realised once we notice that the height of a polynomial is equal to the height of the vector of coefficients in appropriate projective space. The identity $h(fg)=h(f)+h(g)$ follows from the Segre embedding.

But if variables coincide, things get different. For example, $h(x+1)=0$ but $h((x+1)^2)=2$. This is because we do not have $|fg|_\infty=|f|_\infty|g|_\infty$. Nevertheless, for non-Archimedean places, things are easier.

Gauss’s lemma. If $v$ is not Archimedean, then $|fg|_v=|f|_v|g|_v$.

Proof. First of all, it suffices to prove it for univariable cases. If $f$ and $g$ have multiple variables $x_1,\dots,x_n$, let $d$ be an integer greater than the degree of $fg$. Then the Kronecker substitution

reduces our study into $K[t]$. This is because, with such a $d$, this substitution gives a univariable polynomial with the same set of coefficients.

Therefore we only need to show that $|f(t)g(t)|_v=|f(t)|_v|g(t)|_v$. Without loss of generality we assume that $|f(t)|_v=|g(t)|_v=1$. Write $f(t)=\sum a_k t^k$ and $g(t)=\sum b_k t^k$, we have $f(t)g(t)=\sum c_jt^j$ where $c_j=\sum_{j=k+l}a_kb_l$.

We suppose that $|fg|_v<1$, i.e., $|c_j|_v<1$ for all $j$, and see what contradiction we will get. If $|a_j|=1$ for all $j$, then $|c_j|_v<1$ implies that $|b_k|_v<1$ for all $k$ and therefore $|g|_v<1$, a contradiction. Therefore we may assume that, without loss of generality, $|a_0|_v<1$ but $|a_1|_v=1$. Then, since

we have $|a_1b_{j-1}|_v=|b_{j-1}|_v<1$ for all $j \ge 1$. It follows that $|g(t)|_v<1$, still a contradiction. $\square$

So much for non-Archimedean case. For Archimedean case things are more complicated so we do not have enough space to cover that. Nevertheless, we have

Gelfond’s lemma. Let $f_1,\dots,f_m$ be complex polynomials in $n$ variables an set $f=f_1\cdots f_n$, then

where $d$ is the sum of the partial degrees of $f$, and $\ell_\infty(f)=\max_j|a_j|=|f|_\infty$.

Combining Gelfond’s lemma and Gauss’s lemma, we obtain

Mahler Measure

Is not actually given by Mahler initially. It was named after Mahler because he successfully extended it to multivariable cases in an elegant way. We will cover the original motivation anyway.

Original Version and Lehmer’s Conjecture

Say we want to find prime numbers large enough. Pierce came up with an idea. Consider $p(x) \in \mathbb{Z}[x]$, which is factored into

Consider $\Delta_n=\prod_i(\alpha^n_i-1)$. Then by some Galois theory, this is indeed an integer. So perhaps we may find some interesting integers in the factors of $\Delta_n$. Also, we expect it to grow slowly. Lehmer studied $\frac{\Delta_{n+1}}{\Delta_n}$ and observed that

So it makes sense to compare all roots of $p(x)$ with $1$. He therefore suggested the following function related to $p(x)$:

This number appears if we consider $\lim_{n \to \infty}\Delta_{n+1}/\Delta_n$.

He also asked the following question, which is now understood as Lehmer conjecture, although in his paper he addressed it as a problem instead of a conjecture:

Is there a constant $c$ such that, $M(p)>1 \implies M(p)>c$?

It remains open but we can mention some key bounds.

  • Lehmer himself found that

and actually this is the finest result that has ever been discovered. It was because of this discovery that he gave his problem.

This polynomial has also led to the discovery of a large prime number $\sqrt{\Delta_{379}}=1, 794, 327, 140, 357$, although by studying $x^3-x-1$, we have found a bigger prime number $\Delta_{127}=3, 233, 514, 251, 032, 733$.

  • Breusch (and later Smyth) discovered that if $p$ is monic, irreducible and nonreciprocal, i.e. it does not satisfy $p(x)=\pm x^{\deg p}f(1/x)$, then
  • E. Dobrowlolski found that, t if $p(x)$ is monic, irreducible and noncyclotomic, and
    has degree $d$ then

for some $c>0$.

The General Version and Jensen’s Formula

Definition. For $f \in \mathbb{C}[x_1,\dots,x_n]$, the Mahler measure is defined to be

where $d\mu_i=\frac{1}{2\pi}d\theta_i$, i.e., $d\mu_1\dots d\mu_n$ corresponds to the (completion of) Harr measure on $\mathbb{T}^n$ with total measure $1$.

We see through Jensen’s formula that when $n=1$ this coincides with what we have defined before. Observe first that $M(fg)=M(f)M(g)$. Consider $f(t)=a\prod_{i=1}^{d}(t-\alpha_i)$, then

On the other hand, as an exercise in complex analysis, one can show that

Combining them, we see

Taking the logarithm we also obtain Jensen’s formula

We first give a reasonable and useful estimation of $M(f)$, which will be used to prove the Northcott’s theorem.

Definition. For $f(t)=a_dt^d+\dots+a_0$, the $\ell_p$-norm of $f$ is naturally defined to be

For $p=\infty$, we have $\ell_\infty(f)=\max_j|a_j|$.

Lemma 1. Notation being above, $M(f) \le \ell_1(f)$ and

Proof. To begin with, we observe those obvious ones. First of all,

Therefore

Next, by Jensen’s inequality

However, by Parseval’s formula, the last term equals

For the remaining inequality, we use Vieta’s formula

and therefore

for all $0 \le r \le d$. Replacing $|a_{d-r}|$ with $\ell_\infty(f)$, we have finished the proof. $\square$

Before proving Northcott’s theorem, we show the connection between Mahler measure and heights.

Proposition 1. Let $\alpha \in \overline{\mathbb{Q}}$ and let $f$ be the minimal polynomial of $\alpha$ over $\mathbb{Z}$. Then

and

Proof. Put $d=\deg(\alpha)$ and write

Choose a number field $K$ that contains $\alpha$ and is a Galois extension of $\mathbb{Q}$, with Galois group $G$. Then $(\sigma\alpha:\sigma \in G)$ contains every conjugate of $\alpha$ exactly $[K:\mathbb{Q}]/d$ times. Since $a_0,\dots,a_d$ are coprime, for any non-Archimedean absolute value $v \in M_K$, we must have $\max_i|a_i|_v=|f|_v=1$. Combining with Gauss’s lemma and Galois theory, we see

Now we are ready to compute the height of $\alpha$ to rediscover the Mahler’s measure. Notice that

We therefore obtain

The last term corresponds to what we have computed above about non-Archimedean absolute values so we break it down a little bit:

for some $u \mid \infty$, according to the product formula. On the other hand, for $v \mid \infty$,

All in all,

The second assertion follows immediately because

Northcott’s Theorem

The set of non-zero algebraic integers of height $0$ lies on the unit circle, and they are actually roots of unit, by Kronecker’s theorem. However keep in mind that algebraic integers on the unit circle are not necessarily roots of units. See this short paper.

When it comes to algebraic integers of small heights, things may get complicated, but Northcott’s theorem assures that we will be studying a finite set.

Northcott’s Theorem. Given an integer $N>0$ and a real number $H \ge1$, there are only a finite number of algebraic integers $\alpha$ satisfying $\deg(\alpha) \le N$ and $h(\alpha) \le \log H$.

Proof. Let $\alpha$ be a algebraic integer of degree $d<N$ and height $h(\alpha) \le \log H$. Suppose $f(t)=a_dt^d+\dots+a_0 \in \mathbb{Z}[t]$ is the minimal polynomial of $\alpha$. Then lemma 1 shows us that

On the other hand, by proposition 1,

we have actually

This gives rise to no more than $(2\lfloor (2H)^d \rfloor+1)^{d+1}$ distinct polynomials $f$, which produces at most $d(2\lfloor (2H)^d \rfloor+1)^{d+1}<\infty$ algebraic integers. Ranging through all $d \le N$ we get what we want. $\square$

We also have the Northcott property, where we do not care about degrees. A set $L$ of algebraic integers is said to satisfy Northcott property if, for every $T>0$, the set

is finite. Such a set $L$ is said to satisfy Bogomolov property if, there exists $T>0$ such that the set

is empty. As a matter of elementary topology, Northcott property implies Bogomolov property. It would be quite interesting if $L$ is a field. This paper can be quite interesting.

References / Further Reading

  • Erico Bombieri, Walter Gubler, Heights in Diophantine Geometry.

  • Michel Waldschmidt, Diophantine Approximation on Linear Algebraic Groups, Transcendence Properties of the Exponential Function in Several Variables.

  • Chris Smyth, THE MAHLER MEASURE OF ALGEBRAIC NUMBERS: A SURVEY.

🔲 ☆

Hensel's Lemma - A Fair Application of Newton's Method and 'Double Induction'

Introduction

Let $F$ be a non-Archimedean local field, meaning that $F$ is complete under the metric induced by a non-Archimedean absolute value $|\cdot|$. Consider the ring of integers

and its unique prime (hence maximal) ideal

The residue field $k=\mathfrak{o}_F/\mathfrak{p}$ is finite because it is compact and discrete. For compactness notice that $\mathfrak{o}_F$ is compact, and the canonical projection $\mathfrak{o}_F \to k$ is open. For discreteness, notice that $\mathfrak{p}$ is open, connected and contains the unit.

Let $f \in \mathfrak{o}_F[x]$ be a polynomial. Hensel’s lemma states that, if $\overline{f} \in k[x]$, the reduction of $f$, has a simple root $a$ in $k$, then the root can be lifted to a root of $f$ in $\mathfrak{o}_F$ and hence $F$. This blog post is intended to offer a well-organised proof of this lemma.

To do this, we need to use Newton’s method of approximating roots of $f(x)=0$, something like

We know that $a_n \to \zeta$ where $f(\zeta)=0$ at a $A^{2^n}$ speed for some constant $A$, in calculus (do Walter Rudin’s exercise 5.25 of Principles of Mathematical Analysis if you are not familiar with it, I heartily recommend.). Now we will steal Newton’s method into number theory to find roots in a non-Archimedean field, which is violently different from $\mathbb{R}$, the playground of elementary calculus.

We will also use induction, in the form of which I would like to call “double induction”. Instead of claiming that $P(n)$ is true for all $n$, we claim that $P(n)$ and $Q(n)$ are true for all $n$. When proving $P(n+1)$, we may use $Q(n)$, and vice versa.

This method is inspired by this lecture note, where actually a “quadra induction” is used, and everything is proved altogether. Nevertheless, I would like to argue that, the quadra induction is too dense to expose the motivation and intuition of this proof. Therefore, we reduce the induction into two arguments and derive the rest with more reasonings.

Hensel’s Lemma

Hensel’s Lemma. Let $F$ be a non-Archimedean local field with ring of integers $\mathfrak{o}_F=\{\alpha \in F:|\alpha| \le 1\}$ and prime ideal $\mathfrak{p}=\{\alpha \in F:|\alpha|<1\}$. Let $f \in \mathfrak{o}_F[x]$ be a polynomial whose reduction $\overline{f} \in k[x]$ has a simple root $a \in k$, then $a$ can be lifted to $\alpha \equiv a \mod \mathfrak{p}$, such that $f(\alpha)=0$.

By simple root we mean $\overline{f}(a)=0$ but $\overline{f}’(a) \ne 0$. Before we prove this lemma, we see some examples.

Examples and Applications

Square Root of 2 in 7-adic Numbers

Put $F=\mathbb{Q}_7$. Then $\mathfrak{o}_F=\mathbb{Z}_7$, $\mathfrak{p}=7\mathbb{Z}_7$ and $k=\mathbb{F}_7$. We show that square roots of $2$ are in $F$. Note $\overline{f}(x)=x^2-2=(x-3)(x+3) \in k[x]$, we therefore two simple roots of $\overline{f}$, namely $3$ and $-3$. Lifting to $\mathfrak{o}_F$, we have two roots $\alpha_1 \equiv 3 \mod 7\mathbb{Z}_7$ and $\alpha_2 \equiv -3 \mod 7\mathbb{Z}_7$, of $f$. For $\alpha_1$, we have

Hence we can put $\alpha=\sqrt{2}=3+7+2\cdot 7^2+6\cdot 7^3\cdots\in\mathbb{Z}_7 \subset \mathbb{Q}_7$. Likewise $\alpha_2$ can be understood as $-\sqrt{2}$. This expansion is totally different from our understanding in $\mathbb{Q}$ or $\mathbb{R}$.

Roots of Unity

Since $k$ is a finite field, we see $k^\times$ is a cyclic group of order $q-1$ where $q=p^n=|k|$ for some prime $p$. It follows that $x^{q-1}=1$ for all $x \in k^\times$. Therefore $f(x)=x^{q-1}-1$ has $q-1$ distinct roots in $k$. By Hensel’s lemma, $F$ contains all $(q-1)$st roots of unity. It does not matter whether $F$ is isomorphic to $\mathbb{Q}_p$ or $\mathbb{F}_q((t))$.

Proof of Hensel’s Lemma (with Explanation)

Pick any $a_0 \in \mathfrak{o}_F$ that is a lift of $a\mod\mathfrak{p}$. Define

then we claim that $a_n$ converges to the root we are looking for.

Step 1 - Establishing A Sequence by Newton’s Method

First of all, we need to show that $a_n \in \mathfrak{o}_F$, i.e., $|a_n| \le 1$ for all $n$. It suffices to show that $|f(a_{n-1})/f’(a_{n-1})| \le 1$. We firstly observe the case when $n=1$.

Since $\overline{f}(a)=0$ but $\overline{f}’(a) \ne 0$, we have $f(a_0) \in \mathfrak{p}$ but $f’(a_0)\not\in\mathfrak{p}$. As a result, $|f(a_0)|<1$ but $|f’(a_0)|=1$. As a result, $|f(a_0)/f’(a_0)|<1$, which implies that $f(a_0)/f’(a_0) \in \mathfrak{o}_F$ and therefore $a_1 \in \mathfrak{o}_F$.

By Taylor’s theorem.

for some $g_n \in \mathfrak{o}_F[x]$. When $n=1$, we see $g_1(a_1) \in \mathfrak{o}_F$ and as a result $|g_1(a_1)| \le 1$. Therefore

Since $a_1 \in \mathfrak{o}_F$, we also see that $f(a_1) \in \mathfrak{o}_F$ hence its absolute value is not greater than $1$. As a result $|f(a_1)/f’(a_1)| \le 1$, which implies that $a_2 \in \mathfrak{o}_F$.

This inspires us to claim the following two statements:

(a) $|f(a_n)| < 1$ for all $n \ge 0$.

(b) $|f’(a_n)|=|f’(a_0)|=1$ for all $n \ge 0$.

We have verified (a) and (b) for $n=0$ and $n=1$. Now assume that (a) and (b) are true for $n-1$, then, for $n$, we will verify as follows.

First of all, by (a) and (b) for $n-1$, we see $a_n \in \mathfrak{o}_F$.

Consider the Taylor’s expansion

where $h_n \in \mathfrak{o}_F[x]$. It follows that $|h_n(a_n)| \le 1$. Since $|f’(a_{n-1})|=1$, by (b) we actually have

To prove (b) for $n$, we consider the Taylor’s expansion

Notice that since $a_n \in \mathfrak{o}_F$, we have $f’’(a_{n-1}),g_n(a_n) \in \mathfrak{o}_F$. By (a) and (b) for $n-1$, we see

Hence

bearing in mind that for a non-Archimedean absolute value, $|x+y|=\max\{|x|,|y|\}$ iff $|x| \ne |y|$. Through this process we have also proved (b).

Step 2 - Validating the Convergence

We need to show that $\{a_n\}$ is a Cauchy sequence. To do this, it suffices to show that $|f(a_n)| \to 0$ sufficiently quick. Recall in the proof of (a) we have shown that $|f(a_n)| \le |f(a_{n-1})|^2$ for all $n$. By applying this relation inductively, we see $|f(a_n)| \le |f(a_0)|^{2^n}$. Since $|f(a_0)|<1$, it follows that $|f(a_n)| \to 0$ as $n \to \infty$.

For any $\varepsilon>0$, there exists $N>0$ such that $|f(a_n)| <\varepsilon$ for all $n \ge N$. As a result, for all $m>n>N$, we have

Therefore $\{a_n\}$ is Cauchy. Since $F$ is complete, $a_n$ converges to some $\alpha \in \mathfrak{o}_F \subset F$ such that $f(\alpha)=\lim_{n \to \infty}f(a_n)=0$.

Step 3 - Validating the Congruence

In local fields, congruence is determined by inequality. In fact, we only need to show that $|\alpha-a_0|<1$, which means that $\alpha-a_0 \in \mathfrak{p}$, and therefore $\alpha \equiv a \mod \mathfrak{p}$ as expected. To do this, we show by induction that $|a_n-a_0|<1$. For $n=1$ we see $|a_1-a_0|=|f_0|<1$.

Suppose $|a_{n-1}-a_0|<1$ then

Therefore $|\alpha-a_0|=\lim_{n \to \infty}|a_n-a_0|<1$, from which the result follows. $\square$

Stronger Version

In fact we have not explicitly used the fact that $a$ is a simple root. We only used the fact that $|f(a_0)|<1$ but $|f’(a_0)|=1$. Moreover, what really matters here is that $|f(a_n)|$ converges to $0$ quick enough. Therefore $1$ may be replaced by a smaller constant. For this reason we introduce a stronger version of Hensel’s lemma.

Hensel’s lemma, stronger version. Let $F$ be a non-Archimedean local field with ring of integers $\mathfrak{o}_F$. Suppose there exists $a \in \mathfrak{o}_F$ such that $|f(a)|<|f’(a)|^2$, then there exists some $b \in \mathfrak{o}_F$ such that $f(b)=0$ and $|b-a|<|f’(a)|$.

Instead of asserting $|f’(a_n)|=1$ for all $n$, we claim that $|f’(a_n)|=|f’(a_0)|$ (as it should be!). Instead of asserting $|f(a_n)|<1$, we claim that $|f(a_n)| \le \lambda^{2^n}|f’(a_0)|$ where $\lambda=|f(a_0)|/|f’(a_0)|^2$. The proof will be nearly the same.

For example, we can find a square root of $257$ in $\mathbb{Z}_2 \subset \mathbb{Q}_2$. The polynomial $f(x)=x^2-257$ is reduced to $\overline{f}(x)=x^2-1=(x-1)^2$ in $\mathbb{F}_2[x]$, where $1$ is not a simple root. Therefore we cannot apply the original version of Hensel’s lemma to this polynomial. Nevertheless, we see $f(1)=-256$ and $f’(1)=2$. Therefore $|f(1)|=\frac{1}{2^8}$ while $|f’(1)|=\frac{1}{2}$. We can apply Newton’s method here to find a square root of $257$ without worrying about repeated roots.

Ending Remarks

There are a lot of variants of Hensel’s lemma, for example you can do exercise 10.9 of Atiyah-MacDonald. In fact, we later even have Henselian ring and Henselisation of a ring.

There are some other proofs of Hensel’s lemma in this post, for example, since Newton’s method can also be understood as a contraction mapping, we can also prove it using properties of contraction mapping (see K. Conrad’s note).

🔲 ⭐

Study Irreducible Representations of SU(2) Using Fourier Series

Introduction and Prerequisites

Representation theory is important in various branches of mathematics and physics. When studying representation of finite groups, we have quite some algebra and combinatorics. When differentiation (more precisely, smoothness) joins the party, we have Lie group, involving calculus, linear algebra, geometry and much more. Especially, theories around $SU(2)$ and $SO(3)$ are of great importance. On one hand, they are those simplest non-elementary and high-dimensional Lie groups. On the other hand, they describes rotations in $\mathbb{C}^2$ and $\mathbb{R}^3$ respectively, which is “physically realistic”. I believe students in physics have more to say.

In this post we develop a way to study irreducible representations of these two Lie groups, in a mathematician’s way. I try my best to make sure that everything is down-to-earth, and everything can be “reduced” to 19th (pre-modern) mathematics.

Nevertheless, the reader has to be assumed to be familiar with elementary languages of representation theory (and you know that, there are a lot of abuse of language), which I think is not a problem because otherwise you wouldn’t be reading this post. You need to recall eigenvalue theories in linear algebra, as well as Fourier series. We need the fact that the trigonometric system is complete. In other words trigonometric polynomials are dense in the space of continuous functions. $\def\sym{\operatorname{Sym}}$

We will first study $SU(2)$ and a first classification of irreducible representations of $SO(3)$ follows at once. This is because we have an isomorphism

This is to say, $SU(2)$ is a “double cover” of $SO(3)$. To see this, notice that $SU(2) \cong S^3$ and $SO(3) \cong \mathbb{R}P^3$ as Lie groups, meanwhile $\mathbb{R}P^3 \cong S^3/\{-1,1\}$ can be considered as the definition.

Of course, by representation we mean finite dimensional and unitary representations.

Irreducible Representations of the Special Unitary Group

Indeed it seems we have nowhere to start. Instead of trying to find all of them, we will try to work on seemingly immediate representations and it turns out that they are all we are looking for.

Let $V_0$ be the trivial representation on $\mathbb{C}$ and $V_1$ be the standard representation on $\mathbb{C}^2$, which is given by ordinary matrix multiplication. These representations are irreducible. We want to extend this family to $V_n$ for $n \ge 2$. It is natural to think about generate representations of higher dimensions through $V_1$. Here are several ways available.

  • Direct sum: $\bigoplus_{i=1}^{n}V_1$. The dimension is $2n$ and unfortunately, the representation is determined by each component so essentially there is no “new thing”.

  • Tensor product: $\bigotimes_{i=1}^{n}V_1$. The dimension is $2^n$ which is way too big.

  • Wedge product: $\bigwedge^{n}V_1$. It stops at $n=2$ and we have to deal with $u \wedge v = - v \wedge u$. This can be annoying.

  • Symmetric product: $\sym^{n}V_1$. The dimension is $n+1$ and it doesn’t stop. Besides, it can be understood as homogeneous polynomials of degree $n$ in two variables. This is a fantastic choice. Besides we have $\sym^0 V_1=V_0$ so nothing is abruptly excluded.

Spaces of Homogeneous Polynomials

Put $V_n=\sym^nV_1$, which can be understood as the space of homogeneous polynomials of degree $n$ in variables $z_1$ and $z_2$. $V_n$ therefore has a canonical basis

And we will make use of it later.

Definition of the representation

For each $g \in SU(2)$, we have a left action

In other words, $\rho(g)P(z)=P(zg)$ where $z=(z_1,z_2)$ and $zg$ is matrix multiplication. Each $g \in SU(2)$ has matrix representation

Then

When there is no confusion, we will write $gP(z)=P(zg)$, viewing $g$ itself as an automorphism of $V_n$. One can also replace $SU(2)$ with $GL(2,\mathbb{C})$ but we are not studying that bigger one.

Since $z \mapsto zg$ is a homogeneous map of degree $1$ as it is linear and is non-degenerate, we have $gP(z) \in V_n$. In other words, $V_n$ are $SU(2)$-invariant. We now have a well-defined representation. Note $V_0=\mathbb{C}$ so the representation is trivial, and $V_1=\mathbb{C}^2$ yields linear maps. Again, nothing is abruptly excluded. Even more satisfyingly, those $V_n$ are all irreducible.

Irreducibility

Proposition 1. The representations $V_n$ are irreducible.

Proof. By Schur’s lemma, we need to show that each $SU(2)$-equivariant automorphism $A$ of $V_n$ is a non-zero multiple of the identity, i.e. $A=\lambda I$ for some $\lambda \ne 0$. By definition, for each $g \in SU(2)$, we have $A\rho(g)P=\rho(g)AP$ for all $P \in V_n$. And for simplicity we write $Ag=gA$, realising $g$ as a linear transform of $V_n$, instead of an element of $SU(2)$.

The group $SU(2)$ can be complicated, but $U(1) \cong S^1$ is simple and can be considered as a subgroup of $SU(2)$ in two ways. We show that these two ways are just enough to expose the irreducibility of $V_n$.

First of all we embed $S^1$ into $SU(2)$ by

Call the matrix right hand side $g_a$. Then

for all $k$. This is to say, $P_k$ is the eigenvector corresponding to eigenvalue $a^{2k-n}$. As $g_aA=Ag_a$, information on eigenvalues and eigenvectors can help a lot so we dig into it first.

Since $\{P_k\}$ are linearly independent, under this basis, we have a matrix representation

but we don’t know how eigenspaces are spanned because we may have $a^j=a^k$ for $j \ne k$. However, the number $a$ can always be chosen that $a^{-n},a^{-n+2},\dots,a^n$ are pairwise distinct (for example, one can pick $a$ to be a primitive $m$-th root of $1$ and $m$ is big enough). As a result, $g_a$ has $n$ distinct eigenvalues. Therefore, the $a^{2k-n}$-eigenspace can only be generated by $P_k$.

On the other hand, by definition of $A$, we have

Hence $AP_k$ lies in $a^{2k-n}$-eigenspace. Therefore we have $AP_k=c_kP_k$ for some $c_k \ne 0$. In other words, $P_k$ is the $c_k$-eigenvector of $A$. We obtain another matrix representation under the basis $\{P_k\}$

We want this matrix to be a scalar matrix. The result follows from another embedding of $U(1)$ into $SU(2)$. Note $a \in S^1$ can be determined by $t \in [0,2\pi)$, and we therefore have a matrix

Still we have $Ag_t=g_tA$. As we can see,

This follows from our observation on eigenvalues. Next, we immediately use the eigenvalue $c_n$ to obtain

This is the definition of $g_tP_n$. Comparing coefficients of $P_k$, we must have $c_k=c_n$ for all $0 \le k \le n$. Recall that $\{P_k\}$ is a basis so coefficients must be unique for a given vector. But we have already obtained what we want: $A=c_n I$. $\square$

Characters and Fourier Transform

So far we have used diagonalisation of representations of $SU(2)$ but the diagonalisation of $SU(2)$ itself is not touched yet. Neither have we made use of character functions. So now we invite them to the party.

Let’s recall diagonalisation in $SU(2)$. Pick $g \in SU(2)$. First of all it is diagonalisable. Let $\lambda_1$ and $\lambda_2$ be their two eigenvalues, then $|g|=\lambda_1\lambda_2=1$. Therefore we have

where $\lambda$ is one of the eigenvalues of $g$. Since the diagonalised matrix is still in $SU(2)$, we have $|\lambda|=1$, i.e., $\lambda \in S^1$. We therefore write $g \sim e(t) \sim e(-t)$ where

We see, $e(s) \sim e(t)$ if and only if $s = \pm t \mod 2\pi$. By periodicity of $\exp$ function, we also see $e(t)$ is in particular $2\pi$-periodic. If $f:SU(2) \to \mathbb{C}$ is a class function, then $f \circ e:\mathbb{R} \to \mathbb{C}$ is an even $2\pi$-periodic function. Conversely, given an even $2\pi$-periodic function $h:\mathbb{R} \to \mathbb{C}$, we can recover it as a class function, and the process is as follows.

Define $\Lambda:SU(2) \to S^1$ sending $g \in SU(2)$ to the eigenvalue of $g$ with non-negative imaginary part (one can also pick non-positive one, because $h$ is even). Then $E:SU(2) \to [0,\pi]$ given by $g \mapsto \frac{1}{i}\log\Lambda(g)$ is a well defined function sending $g$ into $\mathbb{R}$ and $h \circ E:SU(2) \to \mathbb{C}$ is a class function. Besides we have $E \circ e(t)= \pm t \mod 2\pi$ and $e \circ E(g)$ is the diagonalisation of $g$. Therefore $h \circ E \circ e(t)=h(t)$ and $f \circ e \circ E(g)=f$ as is expected.

With help of this $e(t)$ and $E(t)$, we have this correspondence

Recall that the space on the right hand side has a countable uniform basis

In other words, $\{\cos{nt}\}_{n \ge 0}$ spans a dense subspace. This is about the completeness of trigonometric system. Since there are only even functions, $\sin{nt}$ are excluded. For a reference to the completeness, one can check 4.25 Real and Complex Analysis by W. Rudin.

For class functions, we certainly want to know about characters. Let $\chi_n$ be the character of $V_n$, then

When $t \in \pi\mathbb{Z}$, then $\chi_n(e(t)) \in \mathbb{Z}$. Otherwise, as a classic exercise in calculus, we have

We have $\kappa_0(t)=1$. For $\kappa_n(t)$ when $n >0$, we have

We see $\kappa_1(t)=2\cos{t}$. By induction, every $\kappa_n(t)$ is a polynomial in variables $1,\cos{t},\dots,\cos{nt}$. Therefore $\{\kappa_n(t)\}_{n \ge 0}$ spans the same space as $\{\cos{nt}\}_{n \ge 0}$, which is dense in the space of even $2\pi$-periodic functions. Note the $\kappa_n(t)$ are linearly independent, because the leading term is $\cos{nt}$.

The argument above shows that $\chi_n$ spans a dense subspace in the space of class functions. In other word, $\chi_n$ is the Fourier basis of class functions. As we all know, Fourier series is powerful. Let’s see how powerful it is in the calculus of Lie group $SU(2)$ itself.

Proposition 2. For continuous class function $f:SU(2) \to \mathbb{C}$, we have

Proof. On one hand, since the $V_n$ are irreducible, by fixed point theorem of representations,

Here, for a group $G$ and a representation $V$, $V^G$ is the fixed point set, i.e. the space of elements that are fixed by the action of $G$ on $V$. Since $\chi_n$ is irreducible, fixed points can only be $0$ unless the representation itself is trivial. Now we move on and check the right hand side.

On the right hand side we are looking for even $2\pi$-periodic continuous functions, reflecting the denseness of $\kappa_n(t)$. However we have $\int_{-\pi}^{\pi}\kappa_1(t)dt=\pi$ so it does not vanish on $n>0$. However, if we multiply it by $\sin^2{t}$, then it is transformed into the form $\sin{mt}\sin{nt}$ and we are familiar with this orthonormality. More precisely,

Since the functional $h \mapsto \frac{1}{2\pi}\int_{-\pi}^{\pi}h\sin^2{t}dt$ is continuous in the uniform topology and $\kappa_n$ spans a dense subspace, the result is now obtained. $\square$

Finally, surprisingly and satisfyingly enough, the denseness have actually axed out all other possibilities of irreducible representation. In other words, our search in symmetric products is optimal. We can see this through Parseval’s identity. This is the heart of this blog post.

Proposition 3. Every irreducible representation of $SU(2)$ is isomorphic to one of the $V_n$.

Proof. Suppose we have a character that is different from all of the $\chi_n$. Then the orthonormality shows that $\langle \chi,\chi_n \rangle = 0$ for all $n \ge 0$ and $\langle \chi,\chi \rangle=1$. Now let’s see why this is absurd.

Since $\{\chi_n\}_{n \ge 0}$ spans a dense subspace in the space of class functions, we actually have

Therefore

and

It is impossible to have the sum of $0$ to be $1$. $\square$

Irreducible Representations of the Special Orthonormal Group (First Classification)

Now we head to $SO(3)$. In fact the result follows immediately from the surjection

We have $\ker\pi=\{-I,I\}$. Let $W$ be a representation of $SO(3)$, i.e., we have a map

Then

by $g \mapsto \rho(\pi(g))$ is an induced representation, and we write $\pi^\ast W$. If $W$ is irreducible, then $\pi^\ast W$ is also irreducible. In particular, $\pi^\ast\rho(-I)=\operatorname{id}_W$.

On the other hand, if $\vartheta:SU(2) \to GL(V)$ is an irreducible representation where $\vartheta(-I)=\operatorname{id}_V$, then we have an associated representation

given by $g\ker\pi \mapsto \vartheta(g)$. Let’s denote it by $\pi_\ast V$. Again, if $V$ is irreducible, then $\pi_\ast V$ is irreducible.

Therefore we have realised a correspondence

So it remains to determine those of $SU(2)$. Let $\rho_n:SU(2) \to GL(V_n)$ be an irreducible representation, then

because $P \in \mathbb{C}[z_1,z_2]$ is homogeneous of degree $n$. Therefore $-I$ acts as an identity if and only if $n$ is even. We obtain

Proposition 4. Every irreducible representation of $SO(3)$ is of the form

where $V_{2n}$ is described in proposition 2.

This is, of course, just a first classification. But to introduce a classification as explicit as what we have done for $SU(2)$, there has to be another post. As a quick overview, here is the result.

Let $P_{\ell}$ be the complex vector space of homogeneous polynomials in three variables of degree $\ell$, which can be considered as functions on $\mathbb{R}^3$ immediately. This setting makes sense immediately, just as what we have done for $SU(2)$. Then, in fact,

This is to say, $W_\ell$ can be understood as harmonic homogeneous polynomials in $\mathbb{R}^3$, which can also be considered to be uniquely determined on the unit sphere $S^2$.

Reference

  • Tendor Bröker and Tammo tom Dieck, Representations of Compact Lie Groups.
  • Walter Rudin, Real and Complex Analysis, 3rd Edition.
🔲 ⭐

The Fourier Transform of exp(-cx^2) and Its Convolution

For $0<c<\infty$, define

We want to compute the Fourier transform

As one can expect, the computation can be quite interesting, as $f_c(x)$ is related to the Gaussian integral in the following way:

Now we dive into this integral and see what we can get.

Computing the Fourier Transform

Let’s admit, trying to compute the integral straightforward is somewhat unrealistic. So we need to go through an alternative way. For convenience (of writing MathJax codes) we may write $\varphi(t)=\hat{f}_c(t)$.

First of all, $\hat{f}_c(t)$ is always well-defined, this is because

so we can compute it without worrying about anything.

Integration by Parts and Differential Equation

It’s hard to think about but we do have it. An integration by parts gives

On the other hand, we have

(The well-definedness of the integral can be verified easily.) Combining both, we obtain an differential equation

This differential equation corresponds to an integral equation

And we solve it to obtain

or alternatively,

Now put the initial value back in. As we have shown above, this subjects to the Gaussian integral

Therefore

is exactly what we want.

Before showing another method, we first have an question: can we have $\hat{f}_c=f_c$? Solving an equation with variable in $c$ answers this question affirmatively:

In other words, $f_\frac{1}{2}$ is a fixed point of the Fourier transform. For this class of functions, the fixed point is this and only this one.

Direct Application of the Gaussian Integral

We can also make use of the Gaussian integral to get what we want.

Convolution

As a classic property of the Fourier transform, for $f,g \in L^1$, we have

where

By the way, $f \in L^1$ means $\int_{-\infty}^{\infty}|f(x)|dx<\infty$. One can verify that $f \ast g \in L^1$ here as well.

With this result, we can compute $f_a \ast f_b$ easily. Note

We expect that there exist some $\gamma$ and $c$ such that $f_a \ast f_b = \gamma f_c$. In other words, we are looking for $\gamma,c \in \mathbb{R}$ such that

We should have

We also have

Therefore

where $c$ is given above. We do not even have to compute the integral of convolution explicitly.

🔲 ☆

A Detailed Proof of the Riemann Mapping Theorem

This post

Is intended to supply a detailed proofs of the Riemann mapping theorem.

Riemann mapping theorem. Every simply connected region $\Omega \subsetneq \mathbb{C}$ is conformally equivalent to the open unit disc $U$.

Fortunately the proof can be found in many textbooks of complex analysis, but the proof is fairly technical so it can be painful to read. This post can be considered as a painkiller. In this post you will see the proof being filled with many details. However, the writer still encourage the reader to reproduce the proof by their own pen and paper. The writer also hopes that this post can increase the accessibility of this theorem and the proof.

However, there is a bar. We need to assume some background in complex analysis, although they are very basic already. Minimal prerequisite is being able to answer the following questions.

  • Contour integration, Cauchy’s formula.

  • Almost uniform convergence. Let $\Omega \subset \mathbb{C}$ be open and suppose that $f_j \in H(\Omega)$ for all $j=1,2,\dots$, and $f_j \to f$ uniformly on every compact subset $K \subset \Omega$. Does $f \in H(\Omega)$? What is the uniform limit of $f’_j$? Informally, we call the phenomenon that a sequence of functions uniformly converges on every compact subset almost uniform convergence. This has nothing to do with almost everywhere in integration theory. In fact, this post does not require background in Lebesgue integration theory.

  • Open mapping theorem (complex analysis version).

  • Maximum modulus principle and some variants.

  • Rouché’s theorem. Or even more, the calculus of residues.

Preparation

Despite of the prerequisites, we still need some preparation beforehand.

Simply Connected

Definition 1. Let $X$ be a connected topological space. We say $X$ is simply connected if every curve is null-homotopic. Let $\gamma:[0,1] \to X$ be a closed curve, i.e., it is a continuous map such that $\gamma(0)=\gamma(1)$. We say $\gamma$ is null-homotopic if it is homotopic to a constant map $\gamma_0:[0,1] \to \{x\}$ with $x \in X$.

Intuitively, if $X$ is simply connected, then $X$ contains no “hole”. For example, the unit disc $U$ is simply connected. However, $U \setminus \{0\}$ is not. On the other hand, $U \setminus [0,1)$ is still simply connected. Another satisfying result is that every convex and connected open set is simply connected. This is up to a convex combination.

There are a lot of good properties of simply connected region, which will be summarised below.

Proposition 1. For a region (open and connected subset of $\mathbb{R}^2$), the following conditions are equivalent. Each one can imply other eight.

  1. $\Omega$ is homeomorphic to the open unit disc $U$.
  2. $\Omega$ is simply connected.
  3. $\operatorname{Ind}_\gamma(\alpha)=0$ for every path $\gamma$ in $\Omega$ and $\alpha \in S^2 \setminus \Omega$, where $S^2$ is the Riemann sphere.
  4. $S^2 \setminus \Omega$ is connected.
  5. Every $f \in H(\Omega)$ can be approximated by polynomials, almost uniformly..
  6. For every $f \in H(\Omega)$ and every closed path $\gamma$ in $\Omega$,
  1. Every $f \in H(\Omega)$ has anti-derivative. That is, there exists an $F \in H(\Omega)$ such that $F’=f$.
  2. If $f \in H(\Omega)$ and $1/f \in H(\Omega)$, then there exists a $g \in H(\Omega)$ such that $f=\exp{g}$.
  3. For such $f$, there also exists a $\varphi \in H(\Omega)$ such that $f=\varphi^2$.

5~9 are pretty much saying, calculus is fine here and we are not worrying about nightmare counterexamples, to some extent. Most of the implications $n \implies n+1$ are not that difficult, but there are some deserve a mention. 4 implying 5 is a consequence of Runge’s theorem. In the implication of 7 to 8, one needs to use the fact that $\Omega$ is connected. When we have $f=\exp{g}$, then we can put $\varphi=\exp\frac{g}{2}$ from which we obtain $f=\varphi^2$. 9 implying 1 is partly a consequence of the Riemann mapping theorem. Indeed, if $\Omega$ is the plane then the homeomorphism is easy: $z \mapsto \frac{z}{1+|z|}$ is a homeomorphism of $\Omega$ onto $U$. But we need the Riemann mapping theorem to give the remaining part, when $\Omega$ is a proper subset.

If you know the definition of sheaf, you will realise that $(\mathbb{C},H(\cdot))$ is indeed a sheaf. For each open subset $\Omega \subset \mathbb{C}$, $H(\Omega)$ is a ring, even more precisely, a $\mathbb{C}$-algebra. The exponential map $\exp:g \mapsto e^g$ is a sheaf morphism. However, we now see that it is surjective if and only if $\Omega$ is simply connected. I hope this can help you figure out an exercise in algebraic geometry. You know, that celebrated book by Robin Hartshorne.

Since we haven’t prove the Riemann mapping theorem, we cannot use the equivalence above yet. However, we can use 9 right away. This gives rise to Koebe’s square root trick.

Equicontinuity & Normal Family

Equicontinuity is quite an important concept. You may have seen it in differential equation, harmonic function, maybe just sequence of functions. We will use it to describe a family of functions, where almost uniform convergence can be well established.

Definition 2. Let $\mathscr{F}$ be a family of functions $(X,d) \to \mathbb{C}$ where $(X,d)$ is a metric space.

We say that $\mathscr{F}$ is equicontinuous if, to every $\varepsilon>0$, there corresponds a $\delta>0$ such that whenever $d(x,y)<\delta$, we have $|f(x)-f(y)|<\varepsilon$ for all $f \in \mathscr{F}$. In particular, by definition, all functions in $\mathscr{F}$ are uniformly continuous.

We say that $\mathscr{F}$ is pointwise bounded if, to every $x \in X$, there corresponds some $0 \le M(x) < \infty$ such that $|f(x)| \le M(x)$ for every $f \in \mathscr{F}$.

We say that $\mathscr{F}$ is uniformly bounded on each compact subset if, to each compact $K \subset X$, there corresponds a number $M(K)$ such that $|f(z)| \le M(K)$ for all $f \in \mathscr{F}$ and $z \in K$.

These concepts are talking about “a family of” continuity and boundedness. In our proof of the Riemann mapping theorem, we do not construct the map explicitly, instead, we will use these concepts above to obtain one (which is a limit) that exists. In this post we simply put $X=\Omega \subset \mathbb{C}$, a simply connected region and $d$ is the natural one.

A famous result of equicontinuity is Arzelà-Ascoli, which says that pointwise boundedness and equicontinuity implies almost uniform convergence.

Theorem 1 (Arzelà-Ascoli) Let $\mathscr{F}$ be a family of complex functions on a metric space $X$, which is pointwise bounded and equicontinuous. $X$ is separable, i.e., it contains a countable dense set. Then every sequence $\{f_n\}$ in $\mathscr{F}$ has then a subsequence that converges uniformly on every compact subset of $X$.

Here is a self-contained proof.

Certainly it is OK to let $X$ be a subset of $\mathbb{R}$, $\mathbb{C}$ or their product. We use this in real and complex analysis for this reason. We will need this almost uniform convergence to establish our conformal map. To specify its application in complex analysis, we introduce the concept of normal family.

Definition 3. Suppose $\mathscr{F} \subset H(\Omega)$, for some region $\Omega \subset \mathbb{C}$. We call $\mathscr{F}$ a normal family if every sequence of members of $\mathscr{F}$ contains a subsequence, which converges uniformly on every compact subset of $\mathscr{F}$. The limit function is not required to be in $\mathscr{F}$.

We now apply Arzelà-Ascoli to complex analysis.

Theorem 2 (Montel). Suppose $\mathscr{F} \subset H(\Omega)$ is uniformly bounded, then $\mathscr{F}$ is a normal family.

Proof. We need to show that $\mathscr{F}$ is “almost” equicontinuous, since uniformly boundedness clearly implies pointwise boundedness, we can apply Arzelà-Ascoli later.

Let $\{K_n\}$ be a sequence of compact sets such that (1) $\bigcup_n K_n = \Omega$ and (2) $K_n \subset K^\circ_{n+1} \subset K_{n+1}$, the interior of $K_{n+1}$. Then for every $z \in K_n$, there exists a positive number $\delta_n$ such that

where $D(a,r)$ is the disc centred at $a$ with radius $r$. If such $\delta_n$ does not exist, then there exists a point $z \in K_{n}$ such that whenever $\delta>0$, $D(z,\delta) \setminus K_{n+1} \ne \varnothing$, which is to say, $z$ is a boundary point. But this is impossible because $z$ lies in the interior of $K_{n+1}$ by definition.

For such $\delta_n$, we pick $z’,z’’ \in K_n$ such that $|z’-z’’| < \delta_n$. Let $\gamma$ be the positively oriented circle with centre at $z’$ and radius $2\delta_n$, i.e. the boundary of $D(z’,2\delta_n)$. Recall that the Cauchy formula says

We will make use of this. By the formula above, we have

Now we make use of our choice of $z’$, $z’’$ and $\gamma$. By definition, for $\zeta \in \gamma^\ast$ (the range of $\gamma$), we have $|\zeta-z’|=2\delta_n$. Since $|z’-z’’|<\delta_n$, we have $|\zeta-z’|=2\delta_n=|\zeta-z’’+z’’-z|\le |\zeta-z’’|+|z’’-z’|$. Therefore $|\zeta-z’’| \ge 2\delta_n-|z’’-z’|>\delta_n$. Bearing this in mind, we see

This may looks confusing so we explain it a little more. Since $D(z’,2\delta_n) \subset K^\circ_{n+1}$, we must have $\overline{D}(z’,2\delta_n) \subset K_{n+1}$, therefore whenever $\zeta \in \gamma^\ast=\partial D(z’,2\delta_n)$, we have $|f(\zeta)| \le M(K_{n+1})$. This is where we use the hypothesis of uniformly bounded. we have $|(\zeta-z’)(\zeta-z’’)|>2\delta_n\delta_n$. The integral of the norm of the integrand $\frac{f(\zeta)}{(\zeta-z’)(\zeta-z’’)}$, is therefore bounded by $\frac{M(K_{n+1})}{2\delta_n^2}$. The integral over $\gamma$ is therefore bounded by $\frac{M(K_{n+1})}{2\delta_n^2}$ times $2\pi\delta_n$ and the result follows.

What does this inequality imply? For $\varepsilon>0$, if we pick $\delta=\min\{\delta_n,\frac{2\delta_n\varepsilon}{M(K_{n+1})}\}$, then $|f(z’)-f(z’’)|<\varepsilon$ for every $f \in \mathscr{F}$ and $|z’-z’’|<\delta$. That is, for each $K_n$, the restrictions of the members of $\mathscr{F}$ to $K_n$ form an equicontinuous family.

Now consider a sequence $\{f_j\}$ in $\mathscr{F}$. For each $n$, we apply Arzelà-Ascoli theorem to the restriction of $\mathscr{F}$ to $K_n$, and it gives us an infinite subset $S_n \subset \mathbb{N}$ such that $f_j$ converges uniformly on $K_n$ as $j \to \infty $ and $j \in S_n$. Note we can make sure $S_n \supset S_{n+1}$ because if the subsequence converges uniformly within $S_{n+1}$ then it converges uniformly within $S_n$ as well. Pick a new sequence $\{s_j\}$ where $s_j \in S_j$, then we see $\lim_{j \to \infty}f_{s_j}$ converges uniformly on every $K_n$ and therefore on every compact subset $K$ of $\Omega$. The statement is now proved. $\square$

Remarks. We have no idea what the limit is, and this happens in our proof of the Riemann map theorem as well.

The sequence $\{K_n\}$ can be constructed explicitly, however. In fact, for every open set $\Omega$ in the plane there is a sequence $\{K_n\}$ of compact sets such that

  • $\bigcup_n K_n=\Omega$.
  • $K_n \subset K_{n+1}^\circ$.
  • For every compact $K \subset \Omega$, there is some $n$ such that $K \subset K_n$.
  • Every component of $S^2 \setminus K_n$ contains a component of $S^2 \setminus \Omega$.

The set is constructed as follows and can be verified to satisfy what we want above. or each $n$, define

Then $K=S^2 \setminus V_n$ is what we want.

The Schwarz Lemma

Is another important tool for our proof of the Riemann mapping theorem. We need this lemma to establish important inequalities. This lemma as well as its variants show the rigidity of holomorphic maps. We make use of the maximum modulus theorem. For simplicity, let $H^\infty$ be the Banach space of bounded holomorphic functions on $U$, equipped with supremum norm $| \cdot |_\infty$.

Theorem 3 (Schwarz lemma). Suppose $f:U \to \mathbb{C}$ is a holomorphic map in $H^\infty$ such that $f(0)=0$ and $|f|_\infty \le 1$, then

on the other hand, if $|f(z)|=|z|$ holds for some $z \in U \setminus \{0\}$, or if $|f’(0)|=1$ holds, then $f(z)=\lambda{z}$ for some complex constant $\lambda$ such that $|\lambda|=1$.

Proof. Since $f(0)=0$, $f(z)/z$ has a removable singularity at $z=0$. Hence there exists $g \in H(U)$ such that $f(z)=zg(z)$. Fix $0<r<1$. For any $z \in U$ such that $|z|<r$, we have

Therefore when $r \to 1$, we see $|g(z)| \le 1$ for all $z \in U$. Therefore $|f(z)| \le |z|$ follows. On the other hand, if $|g(z)|=1$ at some point, the maximum modulus forces $g(z)$ to be a constant, say $\lambda$, from which it follows that $|\lambda|=|g(z)|=1$ and $f(z)=\lambda{z}$. $\square$

There are many variances of the Schwarz lemma, and we will be using Schwarz-Pick.

Definition 4. For any $\alpha \in U$, define

This family is a subfamily of Möbius transformation, but we are not paying very much attention to this family right now. We need the fact that such $\varphi_\alpha$ is always a one-to-one mapping which carries $S^1$ (the unit circle) onto $S^1$ and $U$ onto $U$ and $\alpha$ to $0$. This requires another application of the maximum modulus theorem. A direct computation shows that

Theorem 4 (Schwarz-Pick lemma). Suppose $\alpha,\beta \in U$, $f \in H^\infty$ and $| f|_\infty \le 1$, $f(\alpha)=\beta$. Then

Proof. Consider

We see $g \in H^\infty$ and $|g|_\infty \le 1$. What’s more important, $g(0)=\varphi_\beta \circ f(\alpha)=\varphi_\beta(\beta)=0$. By the Schwarz lemma, $|g’(0)| \le 1$. On the other hand, we see

and therefore

In particular, equality holds if and only if $g(z)=\lambda{z}$ for some constant $\lambda$. If this is the case, then

The story can go on but we halt here and continue our story of the Riemann mapping theorem.

The Riemann Mapping Theorem

Each $z \ne 0$ determines a direction from the origin, which can be described by

Let $f:\Omega \to \mathbb{C}$ be a map. We say $f$ preserves angles at $z_0 \in \Omega$ if

exists and is independent of $\theta$.

Conformal mappings preserves angles in a reasonable way. A function $f$ is conformal if it is holomorphic and $f’(z) \ne 0$ everywhere. We have a theorem describes that, but it is pretty elementary so we are not including the proof in this post.

Theorem 5. Let $f$ map a region $\Omega$ into the plane. If $f’(z_0)$ exists at some $z_0 \in \Omega$ and $f’(z_0) \ne 0$, then $f$ preserves angles at $z_0$. Conversely, if the differential $Df$ exists and is different from $0$ at $z_0$, and if $f$ preserves angles at $z_0$, then $f’(z_0)$ exists and is different from $0$.

There is no confusion about $f’(z_0)$. By differential $Df$ we mean a linear map $L:\mathbb{R}^2 \to \mathbb{R}^2$ such that, writing $z_0=(x_0,y_0)$, we have

where $\eta(x,y) \to 0$ as $x \to 0$ and $y \to 0$. To prove this, one can assume that $z_0=f(z_0)=0$. When the differential exists, one writes

We say that two regions $\Omega_1$ and $\Omega_2$ are conformally equivalent if there is a conformal one-to-one mapping of $\Omega_1$ onto $\Omega_2$. The Riemann mapping theorem states that

Theorem 6 (Riemann mapping theorem). Every proper simply connected region $\Omega$ in the plane is conformally equivalent to the open unit disc $U$.

As a famous example, the upper plane $\mathbb{H}$ is conformally equivalent to $U$ by the Cayley transform.

As one may expect, this theorem asserts that the study of a simply connected region $\Omega$ can be reduced to $U$ to some extent. But a conformal equivalence is not just about homeomorphism. If $\varphi:\Omega_1 \to \Omega_2$ is a conformal one-to-one mapping, then $\varphi^{-1}:\Omega_2 \to \Omega_1$ is also a conformal mapping. In the language of algebra, such a mapping $\varphi$ induces a ring isomorphism

Therefore, the ring $H(\Omega_2)$ is algebraically the same as $H(\Omega_1)$. The Riemann mapping theorem also states that, if $\Omega$ is a simply connected region, then $H(\Omega) \cong H(U)$. From this we can exploit much more information on top of homeomorphism. One can also extend the story to $S^2$, the Riemann sphere, but that’s another story.

The Proof by Arguing A Normal Family

The proof is fairly technical. But it is a good chance to attest to our skill in complex analysis. The bread and butter of this proof is the following set:

Our is to prove that there is some $\psi \in \Sigma$ such that $\psi(\Omega)=U$. Note, once the non-emptiness is proved, since $|\psi|<1$ uniformly, we see $\Sigma$ is a normal family.

Step 1 - Prove Non-emptiness Using Koebe’s Square Root Trick

Pick $w_0 \in \mathbb{C} \setminus \Omega$. Then $g(z)=z-w_0 \in H(\Omega)$ and what is more important, $\frac{1}{g} \in H(\Omega)$. By 9 of proposition 1, there exists $\varphi \in H(\Omega)$ such that $\varphi^2(z)=g(z)$, i.e., informally, $\varphi(z)=\sqrt{z-w_0}$ in $\Omega$. If $\varphi(z_1)=\varphi(z_2)$, then $\varphi(z_1)^2=\varphi(z_2)^2=z_1-w_0=z_2-w_0$ and then $z_1=z_2$. Therefore $\varphi$ is one-to-one. On the other hand, if $\varphi(z_1)=-\varphi(z_2)$, we still have $\varphi^2(z_1)=\varphi^2(z_2)=z_1-w_0=z_2-w_0$, and $z_1=z_2$. This is shows that the “square-root” is well-defined here. This is the Koebe’s square root trick.

Since $\varphi$ is an open mapping, there is an open disc $D(a,r) \subset \varphi(\Omega)$, where $a \in \varphi(\Omega)$, $a \ne 0$ and $0<r<|a|$. But by arguments above we have $-a \not\in \varphi(\Omega)$, and therefore $D(-a,r) \cap \varphi(\Omega) = \varnothing$. For this reason, we can put

It follows that

and therefore $\psi(\Omega) \subset U$. Since $\varphi$ is one-to-one, $\psi$ is one-to-one as well and we deduce that $\psi \in \Sigma$, this set is not empty.

Remark. You may have trouble believing that $D(-a,r) \cap \varphi(\Omega)=\varnothing$. But if we pick any $w \in D(-a,r) \cap \varphi(\Omega)$, we have some $z’ \in \Omega$ such that $\varphi(z’)=w$. We also have $|-a-w|<r$ but this implies $|a-(-w)|=|a+w|=|-a-w|<r$, and therefore $-w \in D(a,r) \subset \varphi(\Omega)$. There exists some $z’’ \in \Omega$ such that $\varphi(z’’)=-w$. Hence $-w=w=0$. It follows that $|a|<r$ and this is a contradiction.

Since $D(-a,r) \cap \varphi(\Omega)=\varnothing$, we have $|\varphi(z)-(-a)|>r$ for all $z \in \Omega$ and therefore $|\psi(z)|<1$ is not a problem either.

Step 2 - Enlarge the Range

If $\psi \in \Sigma$ and $\psi(\Omega) \subsetneqq U$, and $z_0 \in \Omega$, then there exists a $\psi_1 \in \Sigma$ such that $|\psi_1’(z_0)|>|\psi’(z_0)|$.

This step shows that we can “enlarge” the range in some way.

For convenience we use the Möbius transformation

Pick $\alpha \in U \setminus \psi(\Omega)$. Then $\varphi_\alpha \circ \psi \in \Sigma$ and $\varphi_\alpha \circ \psi$ has no zero in $\Omega$. Hence there is some $g \in H(\Omega)$ such that

Since $\varphi_\alpha \circ \psi$ is one-to-one, another application of Koebe’s square root trick shows that $g$ is one-to-one. Therefore we have $g \in \Sigma$ as well. If $\psi_1=\varphi_\beta \circ g$ where $\beta=g(z_0)$, we have $\psi_1 \in \Sigma$ (one-to-one). In particular, $\psi_1(z_0)=0$.

By putting $s(z)=z^2$, we have

If we put $F(z)=\varphi_{-\alpha} \circ s \circ \varphi_{-\beta}(z)$, then the chain rule shows that

(Note we used the fact that $\psi_1’(z_0)=0$.) If we can prove that $0<|F’(0)|<1$ then this step is complete. Note $F$ satisfy the condition in Schwarz-Pick lemma and therefore

The first equality does not hold because $F$ is not of the form $\varphi_{-\sigma}(\lambda\varphi_{\eta}(z))$ for $|\lambda|=1$. On the other hand we have

Therefore $0<|F’(0)|<1$ and the this step is complete.

Step 3 - Find the Function with Largest range, Namely the Disc

We take the contraposition of step 2:

Fix $z_0 \in \Omega$. If $h \in \Sigma$ is an element such that $|h’(z_0)| \ge |\psi’(z_0)|$ for all $\psi \in \Sigma$, then $h(\Omega)=U$.

The proof is complete once we have found such a function! To do this, we use the fact that $\Sigma$ is a normal family. Put

By definition of $\eta$, there is a sequence $\{\psi_n\}$ such that $|\psi_n’(z_0)| \to \eta$ in $\Sigma$. By normality of $\Sigma$, we pick a subsequence $\varphi_k=\psi_{n_k}$ that converges uniformly on compact subsets of $\Omega$. Put the uniform limit to be $h \in H(\Omega)$. It follows that $|h’(z_0)|=\eta$. Since $\Sigma \ne \varnothing$ and $\eta \ne 0$, $h$ cannot be a constant. Since $\varphi_n(\Omega) \subset U$, we must have $h(\Omega) \subset \overline{U}$. But since $h$ is open, we are reduced to $h(\Omega) \subset U$.

It remains to show that $h$ is one-to-one. Fix distinct $z_1, z_2 \in \Omega$. Put $\alpha=h(z_1)$ and $\alpha_n=\varphi_n(z_1)$, then $\alpha_n \to \alpha$. Let $\overline{D}$ be a closed disc in $\Omega$ centred at $z_2$ with interior denoted by $D$ such that

  • $z_1 \not\in \overline{D}$.
  • $h-\alpha$ has no zero point on the boundary of $\overline{D}$.

We see $\varphi_n -\alpha_n$ converges to $h-\alpha$, uniformly on $\overline{D}$. They have no zero in $D$ since they are one-to-one and have a zero at $z_1$. By Rouché’s theorem, $h-\alpha$ has no zero in $D$ either, and in particular $h(z_2)-\alpha = h(z_2)-h(z_1) \ne 0$. This completes the proof. $\square$

Remark. First of all, such a $\overline{D}$ is accessible. This is because zero points of $h-\alpha$ has no limit point in $\Omega$, i.e., they are discrete (when defining $\overline{D}$, we don’t know how many are there yet).

Our choice of $\overline{D}$ enables us to use Rouché’s theorem (chances are you didn’t get it). Since $h-\alpha$ has no zero on the boundary, we have $\zeta=\inf_{z \in \partial D}|h(z)-\alpha|>0$. When $n$ is big enough, we see

The second inequality is another application of the maximum modulus theorem. Rouché’s theorem applies here naturally as well. $\square$

This proof is a reproduction of W. Rudin’s Real and Complex Analysis. For a comprehensive further reading, I highly recommend Tao’s blog post.

🔲 ☆

Examples in Galois Theory 3 - Polynomials of Prime Degree and Pairs of Nonreal Roots

Introduction

In the previous post we are convinced that the Galois group of a separable irreducible polynomial $f$ can be realised as a subgroup of the symmetric group, the elements of which permute the roots of $f$. We worked on cubic polynomials over a field with characteristic not equal to $2$ and $3$, and this definitely works with $\mathbb{Q}$. In this post we go one step further.

Let $f \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree $p$. Since it is also separable (see lemma 9.12.1 on the stack project), we can safely work on its Galois group $G$. One immediately wants to question the position of $\mathfrak{S}_p$. Indeed we have $G \subset \mathfrak{S}_p$. The question is, when does the equality hold? It is not likely to have an immediate answer. However, we have some interesting sufficient conditions, which will be discussed in this post.

Generators of the Symmetric Group

We present some handy results in finite group theory that will be used in the main result. One may skip this section until needed. I will collapse the proof in case one wants to treat it as an exercise.

Lemma 1. Let $p$ be a prime number. The symmetric group $\mathfrak{S}_p$ is generated by $[12 \cdots p]$ and an arbitrary transposition $[rs]$.

Proof. We prove this by presenting several sets of generators of $\mathfrak{S}_n$ where $n$ is a positive integer.

  1. It is generated by cycles. This is a really, really routine verification and sometimes this is assumed as a fact.

  2. It is generated by transpositions, i.e., $2$-cycles. It suffices to show that a cycle is a product of transpositions. Indeed, for any cycle $[i_1\dots i_k]$ in $\mathfrak{S}_n$, we have $[i_1\cdots i_k]=[i_1i_2][i_2i_3]\cdots[i_{k-1}i_k]$. This proves our statement.

  3. It is generated by translations of the form $[1k]$. It suffices to show that a transposition is generated as such. For any transposition $[rs]$, we have $[rs]=[1r][1s][1r]$.

  4. It is generated by adjacent translations, i.e. the generators can be of the form $[k-1 ,k]$. This follows from the following identity:

  1. It is generated by two elements: $\sigma=[12]$ and $\tau=[12\cdots n]$. This follows from the following identity:

Now, back to the case when $n=p$ is prime. Put $\sigma=[rs]$ and $\tau=[12\cdots p]$. If $s-r=1$ then it is already proved in 5 by several conjugations. Therefore we may assume that $d=s-r>1$. From now on integers may be a number in either $\mathbb{Z}$ or $\mathbf{F}_p=\mathbb{Z}/p\mathbb{Z}$, depending on the context. Recall that $\mathbf{F}_p$ is a field. Pick the integer $w$ such that $dw=1$ in $\mathbf{F}_p$. By conjugation we see $\tau$ and $\sigma$ generate

The product of elements above is $[1,1+wd]=[12]$. Therefore we are still back to 5. $\square$

Computing the Galois Group

We have many good reasons to study the Galois group of something. It would be great if the group can be written down explicitly. In this section we show that the group can be revealed by the number of nonreal roots.

The Simplest Case

Proposition 1. Let $f(X) \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree. If $f$ has precisely two nonreal roots, then the Galois group $G$ over $\mathbb{Q}$ is $\mathfrak{S}_p$.

Proof. Let $L$ be the splitting field of $f$. It suffices to show that $G$ contains a transposition and a $p$-cycle, which is $[12\cdots p]$. By the Sylow’s theorem, $G$ has a subgroup $H$ of order $p$, which can only be cyclic. Say $H=\langle \sigma \rangle$. Suppose $\sigma$ is of cycle type $(k_1,\dots,k_r)$. Then the period of $\sigma$, which equals $p$, is the least common multiple of $k_1,\dots,k_r$, where $k_1+\dots+k_r=p$. This can only happen when $r=1$ and $k_1=p$. Therefore $\sigma$ is a $p$-cycle.

In fact, $\sigma$ can be considered as $[12\dots p]$. Suppose the order of roots of $f$ is given, for which we have $\sigma=[i_1 i_2 \dots i_p]$. Then If we re-order these roots, by putting the $k$th root to be the original $i_k$th root, then we can write $\sigma=[12\dots p]$. (This re-ordering is, in fact, a conjugation.)

It remains to prove that $G$ contains a transposition. Let $\alpha$ and $\beta$ be two nonreal roots of $f$. Since $\overline{\alpha}$ is also a root of $f$ (because coefficients of $f$ are real; if $\sum_{n=0}^{p}a_n\alpha^n=0$, then $\sum_{n=0}^{p}a_n\overline{\alpha}^n=\sum_{n=0}^{p}\overline{a_n\alpha^n}=\overline{0}=0$) we see $\beta=\overline{\alpha}$. Therefore complex conjugation over $\mathbb{Q}(\alpha)$ extends to $L$ as an element of order $2$, which is a transposition in $G$. This proves our assertion. $\square$

For example, consider the polynomial

With calculus one can show that it has exactly three roots, hence it has two nonreal roots. Eisenstein’s criterion shows that $f$ is irreducible. Therefore we are allowed to use proposition 1. The Galois group of $f$ is $\mathfrak{S}_5$.

This also works fine when $p=2$ or $3$. The case when $p=2$ is nothing but working around a quadratic polynomial. When $f(X)$ is irreducible of degree $3$, and it has two nonreal roots, we also know that it has an irrational root. Let the roots be $a+bi,a-bi,c$ where $b \ne 0$ and $c$ is irrational. We see

Therefore the Galois group is $\mathfrak{S}_3$.

“Linear” Generalisation

It is way too ambitious to restrict ourselves in one single pair of roots. Also, it seems we have ignored the alternating group $\mathfrak{A}_p$ for no reason. Oz Ben-Shimol gave us a nice way to work around this (see arXiv:0709.2868). The whole paper is not easy but the result is pretty beautiful and generalised what we said above as $p \ge 5$.

Proposition 2. Let $f \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree $p \ge 5$. Suppose that $f$ has $k>0$ pairs of nonreal roots. If $p \ge 4k+1$, then the Galois group $G$ is isomorphic to $\mathfrak{A}_p$ or $\mathfrak{S}_p$. If $k$ is odd then $G \cong \mathfrak{S}_p$.

The proof is done by showing that $\mathfrak{A}_p \subset G \subset \mathfrak{S}_p$. As the index of $\mathfrak{A}_p$ is $2$, $G$ can only be one of them. The solvability of $G$ is also concerned here.

Indeed, what we have proved in “the simplest case” is nothing but $k=1$. When $p \ge 5$ we clearly have $p \ge 1+4 \times 1$. This refined the result of A. Bialostocki and T. Shaska (see arXiv:math/0601397), and the inequality used to be

When $k$ is big enough, we have $k(k\log{k}+2\log{k}+3) \ge 4k+1$. Oz Ben-Shimol’s result is a refinement because it is saying, $p$ does not need to that big. He also offered a refined algorithm to compute the Galois group, which we will present below. Also, computing $4k+1$ is much easier than computing $k^2\log{k}$ plus something.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Input: An irreducible polynomial f(X) over Q with prime degree p >= 5
Output: The Galois group Gal(f/Q)
begin
r:=NumberOfRealRoots(f(X))
k:=(p-r)/2
if k>0 and p>=4k+1 then
if k is odd then
Gal(f/Q)=S_p;
else
if ∆(f) is a complete square then
Gal(f/Q)=A_p;
else
Gal(f/Q)=S_p;
endif;
endif;
else
ReductionMethod(f(X));
endif;
end;

Here, $\Delta(f)$ is the discriminant of $f$. We have seen that whether $\Delta$ is a perfect square matters a lot. The discussion of ReductionMethod can be trailed in Oz Ben-Shimol’s paper.

❌