Don't start from scratch

Quick description

A common mistake people make when trying to answer a mathematical question is to work from first principles: it is almost always easier to modify something you already know. This article illustrates the point with examples that range from simple arithmetic to the forefront of research.

This article could use additional contributions. Lots more examples needed, of ever-increasing sophistication.

Example 1: multiplying two-digit numbers together

Example 2: convergence of sequences

Example 3: convergence of series

Example 4: the matrix of a rotation

Example 5: the set of invertible $n\times n$ matrices is open

Example 1: multiplying two-digit numbers together

What is $16\times 17$ ? The obvious way of answering this question is to use your favourite multiplication technique to work it out. For example, you might say that it is $100+60+70+42=272$ . However, if you are keen on mathematics, then this probably won't be the best way of going about it, since you are likely to know many "nearby" facts and it is a pity to waste that knowledge. For example, if you know the squares of all numbers up to 20, or at least if you know that $16^2=256$ (which perhaps you know because you like powers of 2), then you can reason as follows: if sixteen sixteens are 256, then I get to seventeen sixteens by adding another 16, which takes me to 272.

This very simple example illustrates an important principle in mathematics: even if getting from A to C is hard, you may know how to get to a point B from which getting to C is easy. This phenomenon shows up strongly in mental arithmetic: if you are asked to multiply two two-digit numbers together in your head, and have a reasonable "web of number facts" at your disposal, then it is almost always better to use tricks like the one in the previous paragaph. (Another approach would have been to multiply 17 by 2 four times, getting the sequence 34, 68, 136, 272.)

More mental-arithmetic examples

What other "number facts" might one know? One that many people are aware of is the amusing fact that $3\times 37=111$ . This makes multiplying by $37$ pretty easy. For example, if you need to work out $23\times 37$ you can use the fact that $24\times 37=8\times 3\times 37=888$ , so $23\times 37=888-37=851$ . Or if you wanted to work out $28\times 38$ you could first work out $27\times 37$ , getting $9\times 3\times 37=999$ . Then you could add $37$ to it and deduce that $28\times 37=1036$ . And finally you could add a further $28$ to get $1064$ , which is the answer to the question asked.

Another useful principle is that $(x+y)(x-y)=x^2-y^2$ . This means that if you know your squares, then you can multiply together any two numbers that are reasonably close together. For example, $13\times 17=(15-2)\times(15+2)=15^2-2^2=225-4=221$ . What about $19\times 22$ ? We could first work out $19\times 21$ , getting $399$ (by the difference-of-squares method) and then add another $19$ to get $418$ (because it's $400-1+19=400+18).$

A less-known trick is that this principle also works "backwards", allowing us to easily compute any square using the equation $x^2 = (x+y)(x-y)+y^2$ . We just need to pick a suitable $y$ for which one of $x+y$ and $x-y$ becomes simple. For example, we have

$63^2 = (63-3)(63+3) + 3^2 = 60\cdot 66 + 9 = 3960 + 9 = 3969,$

and also

$48^2 = (48-2)(48+2) + 2^2 = 46\cdot 50 + 4 = 2304.$

In all these cases, the basic method is to get what one might think of as a "first approximation" to the calculation and then adjust it.

Example 2: convergence of sequences

Suppose that you are asked to prove completely rigorously that $\frac{(4n+3)^2}{n^2-n-8}$ tends to $16$ as $n$ tends to infinity. Some people think that the words "completely rigorously" are an instruction to go back to first principles and produce a proof that starts like this. "Let $\epsilon>0$ . We shall show that there exists an $N$ such that $\left|\frac{(4n+3)^2}{n^2-n-8}-16\right|<\epsilon$ whenever $n\geq N$ ." But that is not how to go about it. A much better way is to begin by dividing top and bottom by $n^2$ . That gives us $\frac{(4+3/n)^2}{1-1/n-8/n^2}$ . Once we have written the expression in this way, we can deduce that it converges to $16$ just from the fact that $1/n$ tends to zero and from standard theorems about limits. Those theorems are the ones that say that if $a_n\rightarrow a$ and $b_n\rightarrow b$ , then $a_n+b_n\rightarrow a+b$ , $a_n-b_n\rightarrow a-b$ , $a_nb_n\rightarrow ab$ and (if $b\ne 0$ ) $a_n/b_n\rightarrow a/b$ . So from the fact that $1/n\rightarrow 0$ we get that $1/n^2\rightarrow 0$ , so $8/n^2\rightarrow 0$ , so $1/n+8/n^2\rightarrow 0$ , so $1-1/n-8/n^2\rightarrow 1$ . Similarly, $3/n\rightarrow 0$ , so $4+3/n\rightarrow 4$ , so $(4+3/n)^2\rightarrow 16$ . And finally, that implies that $\frac{(4n+3)^2}{n^2-n-8}\rightarrow 16$ .

Example 3: convergence of series

How does one show that the series $1+\frac 14+\frac 19+\frac 1{16}+\frac 1{25}+\dots$ converges? That is, how does one show that as you keep on adding reciprocals of squares, the resulting sum doesn't go off to infinity? Convergence of infinite series is usually one of the first topics covered in a first course in real analysis, and one of the messages one needs to learn is that the best way of proving that a given series converges is usually not to start from scratch, but rather to deduce the convergence from the fact that some other series is known to converge.

In this case, the difficulty is that there doesn't seem to be a nice expression for the function $g(n)=1+\frac1{2^2}+\dots+\frac 1{n^2}$ . But perhaps we should think about that. What do we know about $g$ ? Well, from its definition we know that $g(n)-g(n-1)=\frac 1{n^2}.$ Applying the advice that one should hunt for analogies, we might recall that taking the difference $g(n)-g(n-1)$ is a bit like taking the derivative of $g$ . And if we did take the derivative of $g$ and obtained $\frac 1{n^2}$ , then $g(n)$ would have had the form $C-\frac 1n$ .

Now let us turn that on its head and see what $g(n)-g(n-1)$ is if $g(n)=C-\frac 1n$ for every $n$ . The answer is $\frac 1{n-1}-\frac 1n=\frac 1{n(n-1)}$ . And turning that on its head, we see that the series $\frac 1{1.2}+\frac 1{2.3}+\frac 1{3.4}+\dots$ can be analysed easily, because

$\frac 1{1.2}+\frac 1{2.3}+\dots+\frac 1{(n-1)n}=\left(1-\frac 12\right)+\left(\frac 12-\frac 13\right)+\dots+\left(\frac 1{n-1}-\frac 1n\right) =1-\frac 1n.$

But $\frac 1{n^2}<\frac 1{(n-1)n}$ , so from this it follows that $\frac 1{2^2}+\frac 1{3^2}+\dots+\frac 1{n^2}<1$ . Since that is true for every $n$ , the series $1+\frac 14+\frac 19+\frac 1{16}+\frac 1{25}+\dots$ converges and its limit is less than 2.

The point here is that we deduced the convergence of the series in an easy way from the convergence of a series that can be shown easily to converge.

You might object that you prefer a different proof that the series converges. For example, another well-known proof is to group the terms of the series as follows:

$1+\left(\frac 1{2^2}+\frac 1{3^2}\right)+\left(\frac 1{4^2}+\dots+\frac 1{7^2}\right)+\left(\frac 1{8^2}+\dots+\frac 1{15^2}\right)+\dots$

This is clearly less than what you get if you replace each fraction in a group by the largest fraction in that group. And what you get is

$1+\frac 2{2^2}+\frac 4{4^2}+\frac 8{8^2}+\dots=1+\frac 12+\frac 14+\frac 18+\dots=2$

So the original series converges and its limit is less than 2.

However, if you prefer this proof, that doesn't imply that you like starting from scratch. This proof also uses what you know (that the series $1+\frac 12+\frac 14+\frac 18+\dots$ converges) to help you show something you don't know (that the series $1+\frac 14+\frac 19+\dots$ converges).

Example 4: the matrix of a rotation

Here is a well-known matrix calculation: to find the matrix of a rotation through $\theta$ about a unit vector $u$ in $\mathbb{R}^3$ . The direct starting-from-scratch approach would be to work out what happens to the vectors $(1,0,0)$ , $(0,1,0)$ and $(0,0,1)$ under this rotation. However, that is quite a difficult calculation. A much better approach is to exploit the fact that we can write down the answer easily if we are lucky enough that $u$ is the vector $(0,0,1)$ . Then the rotation is just a rotation of the $xy$ -plane, so it has matrix $R=\left(\begin{matrix} \cos\theta & -\sin\theta & 0\\ \sin\theta & \cos\theta & 0\\ 0 & 0 & 1\\ \end{matrix}\right)$ . Now all we have to do is come up with the matrix $S$ of a rotation that takes $(0,0,1)$ to $u$ and calculate $SRS^{-1}$ . Why? Because the corresponding linear map moves $u$ to $(0,0,1)$ , then rotates everything through $\theta$ about $(0,0,1)$ , then moves $(0,0,1)$ back to $u$ again: the total effect is a rotation through $\theta$ about $u$ .

How do we find a rotation that takes $(0,0,1)$ to $u$ ? Well, the matrix of such a rotation has $u$ as its third column. So all we need to do is find two other unit column vectors $v$ and $w$ that are orthogonal to each other and to $u$ , and the matrix formed by $v$ , $w$ and $u$ will be the matrix of a rotation that takes $(0,0,1)$ to $u$ (unless it has determinant $-1$ , in which case swap the first two columns). There is still a bit of calculation involved here, but it is pretty straightforward.

Example 5: the set of invertible $n\times n$ matrices is open

The set of invertible $n\times n$ matrices can be thought of as a subset of $\mathbb{R}^{n^2}$ , and this subset turns out to be open. In other words, if you take any invertible matrix and change its entries by a small enough amount, then the perturbed matrix will also be invertible.

How should one prove this? The obvious approach is to dive straight in. If $A=(a_{ij})_{i,j=1}^n$ is our invertible matrix then we let $\epsilon$ be a small positive constant (to be chosen later) and try to prove that any matrix $B=(b_{ij})_{i,j=1}^n$ such that $|b_{ij}-a_{ij}|<\epsilon$ for every $i$ and $j$ is also invertible. But how do we find the inverse of $B$ ?

What we have just done is work directly with the definitions of "open set" and "invertible". However, if we are prepared to use a little bit of theory, the problem becomes dramatically easier. First of all, what are some fundamental facts about open sets? One is that they are complements of closed sets. Another is that the inverse image of an open set under a continuous function is open. If we use the first of these, we transform our problem into that of showing that the set of non-invertible matrices is closed. We could then use the sequence definition of closed sets and try to prove that a limit of non-invertible matrices is non-invertible. But there doesn't seem to be an obvious way to do that – or at least not yet. If we think about the second property of open sets, then we see that one way of proving that our set is open is to find a continuous function $f$ from the set of all $n\times n$ matrices to some other space $X$ such that there is an open subset $U$ of $X$ with $f^{-1}(U)$ equal to the set of all invertible matrices.

It's not instantly obvious what such a function might be, so let us turn to the word "invertible". What important facts do we know about invertible matrices? One of them is that a matrix is invertible if and only if its rank is $n$ . Another is that it is invertible if and only if it has non-zero determinant.

At this point a lightbulb should go on. We were looking for a continuous function defined on matrices, and the determinant is a function defined on matrices. Is it continuous? Yes, since it is a polynomial function of the entries of the matrix. Can we express the set of invertible matrices as the inverse image of an open set? Yes, since they are precisely the matrices with determinant belonging to the set of non-zero real (or complex if we are talking about complex matrices) numbers, and that set is open.

So the eventual proof is extremely easy: the invertible matrices are the inverse image of the open set $\mathbb{R}\setminus\{0\}$ under the continuous function "determinant of". How did we find this proof? By not starting from scratch: instead, we spent a short time thinking about well-known facts connected with the concepts that we were trying to relate.

I hinted earlier that one could use the fact that open sets are complements of closed sets, and indeed that gives an alternative, fairly similar approach. We would like to prove that a limit of non-invertible matrices is non-invertible. Again, once we think of using determinants this becomes easy. It is enough to prove that a limit of matrices with zero determinant has zero determinant. This will certainly be the case if the determinant is continuous – which it is.