Made with Mathcad

A document from the Mathcad Library
Open this file with Mathcad to work with the calculations.

This is an excerpt from the book Computational Methods for Applied Science and Engineering: An Interactive Approach , by Mario G. Ancona, published by Rinton Press, Ó2003. Reproduced with permission of the publisher.
IV.C.7. Fast Direct Methods

The kings of efficiency for solving PDEs implement a divide-and-conquer strategy. Prime exemplars of this strategy are the fast direct methods discussed in this section, the splitting methods of Sect. IV.B.8 , the multi-grid technique of Sect. IV.C.5 , the matrix methods of Sect. IV.C.6 and, best of them all, the spectral methods of Sect. V.C [ 1 ]. The rationale behind the divide-and-conquer strategy is readily understood. If N is the number of unknowns to be solved for, then conventional methods tend to require on the order of N2 operations or worse. But if we have a way of splitting the problem in half (at no cost) then it means we can do the same problem with only (N/2)2 + (N/2)2 = N2/2 operations. Multiple splittings will compound this gain and a fully split method (i.e., with log2(N) splittings) will typically require only N@log2(N) operations. The tremendous savings involved are depicted below:
In a 2-D problem with a moderate-sized 100 x 100 mesh we have N = 104. And in a 3-D version of this problem N equals 106. Change Nmax in the plot at right to these "moderate" values and see the extraordinary gains possible with N@log2N schemes!
In this section we focus on two types of divide-and-conquer strategies, cyclic reduction and the fast Fourier transform. The finite-difference methods that result are referred to as fast direct methods because they solve elliptic/parabolic problems with a rapid non-iterative technique [ 2 ]. The main limitation of these methods is that they require special structure in the problem and its discretization for their implementation. As a result, they tend to be rather limited in their applicability, e.g., to linear, constant coefficient problems. For more general problems, iterative methods such as those treated earlier in this chapter (sometimes incorporating fast direct methods --- see below) must be used.
Cyclic reduction

Cyclic reduction requires that the elliptic problem have the analytical property of being separable [ 3 ], a strong restriction on applicability. To illustrate the technique consider solving the Poisson equation, fxx + fyy = r, on an X x Y rectangle with Dirichlet BCs. The method requires a uniform grid and we assume x = j@h with j = 0, . . ., J+1 and X = (J+1)@h and y = k@h with k = 0, . . ., K+1 and Y = (K+1)@h. Employing the usual centered differencing, at a generic interior point (j,k) we have
j,k:
Next consider the similar equations at the neighboring points (j+1,k), (j-1,k), (j,k+1) and (j,k-1):
j+1,k:
j-1,k:
j,k+1:
j,k-1:
Now, the cleverness of the method is to notice that these five equations can be combined to eliminate Fj+1,k-1, Fj+1,k, Fj+1,k+1, Fj-1,k-1, Fj-1,k and Fj-1,k+1. Specifically, forming the linear combination [j-1,k] + [j+1,k] - [j,k+1] - [j,k-1] + 4 x [j,k] yields:
=
And if we apply this trick to all columns with j even (assuming J+1 is even), then we obtain a set of equations involving F's at only the even-j points. In other words this procedure has cut in half the number of equations and unknowns in the linear system (J/2 x K unknowns)! What's more, if (J+1)/2 is even, then the process can be repeated to reduce the problem to a quarter of its original size. And if J+1 = 2m then we can perform these "reductions" m-1 times thereby
reducing the system to K equations in the K unknowns along the centerline, F(J+1)/2,k with k = 1, . . . , K. (The equations will also involve F0,k and FJ+1,k but these are known from the boundary conditions). This linear system has a bandwidth of 2m+1 and can be solved in K x m2 operations (see Sect. IV.C.1 ). After solving it, one can then work backwards to fill in all the other F's. The first such stage requires solving one linear system (with bandwidth 2m-1) for F(J+1)/4,k and one for F3(J+1)/4,k. At the next stage, four such linear systems must be solved and so on. The total operations count is roughly
Total operations ~
and since N = J x K this is roughly N@log2N operations as initially claimed.

Exercise IV.C.7.a.(e) For the case with J = K = 7 write out the sequence of linear systems involved in the cyclic reduction method explicitly. Compute the number of operations required and compare it with the foregoing formula.


As emphasized above, the speed of cyclic reduction comes directly from its divide-and-conquer strategy. Again, since there's little cost in dividing up the linear system each such divide speeds things up because solving a linear system of a given size takes at least twice as long to solve as two linear systems of half the size. Performing as many such divisions as possible (maximized if the number of mesh points in the critical direction is a power of 2) compounds the speed-up giving the N@log2N performance.
Fourier Transform Methods

Fourier transforms are standard tools for analytical work, including for solving differential equations. It turns out that they are also powerful and important in the numerical realm and this is primarily a result of the fact that there exist N@log2N divide-and-conquer algorithms for finding them. Before discussing these latter algorithms, we first summarize briefly what the Fourier transform is and how it is discretized. PRESS should be consulted for further details.
The Fourier transform of a continuous real space function r(x) is defined by [ 4 ]
with
The function r(x) and its transform R(f) are, in general, complex-valued. However, in most (but
not all) physical applications r(x) is real-valued. In this case, R(f) is still complex-valued and it gives the spectral content of r(x), i.e., the amplitudes and phases of its component waves at every frequency f. R(f) is therefore a representation of r(x) in frequency space that is closely akin to the Fourier series representation of r(x) discussed in Sect. II.C.4 . To reconstruct the real space function r(x) from its frequency domain representation R(f) one uses the inverse Fourier transform
To develop numerical estimates for the Fourier transform and its inverse we note:

@
where the rectangular rule ( Sect. II.E.1 ) has been employed for the quadrature. Note that
this formula for R(f) is based on N values of rj and that rN is not used.

  • R(f) is also a continuous function and it too must be given a discrete representation if one is to estimate the inverse transform. Clearly, the discrete representation of R(f) should be N complex numbers since it will be computed from the N possibly complex values rj. But unlike r(x), R(f) is possibly non-zero at all frequencies (as we saw in Sect. II.C.4 ) and this means the inverse transform involves an integral over an infinite domain (see the end of Sect. II.E.1 ). To handle this the standard procedure is to replace the infinite domain by a finite interval [-fc,fc] with fc suitably large. We can then discretize R as R(fn) with fn = n@Df, n = -N/2, . . . , N/2 (with N even) and fc = NDf/2 and the inverse transform is estimated as
@
@
where again the quadrature has been performed using the rectangular rule. Note that, as before, this formula is based on N values of R(fn) and that R(fN/2) is not used. And how do we choose fc (for a given N) so as to capture the frequency content of r(x) yet still keep Df
"small" for resolution purposes? The answer is given by the Sampling Theorem (see Sect. II.B.1 ) which says that for a smooth periodic function r(x) the maximum frequency captured by an evenly-spaced sampling is 1/(2h) [ 5 ]. Therefore, we choose fc = 1/(2h) which implies Df = 1/(Nh) = 1/L and fn@xj = (nDf)@(jh) = n@j/N.
We are now in a position to write down the discrete transform and its inverse. We define the discrete Fourier transform (DFT) to be Rn = R(fn)/h (in order to make it independent of h) and then have
The inverse discrete Fourier transform (IDFT) is [ 6 ]
Finally, for purposes of symmetry it is customary to re-label Rn putting the negative frequencies between n = N/2 and N-1 so that the inverse becomes
The latter form emphasizes the similarity between the transform and its inverse; as we'll soon see one can use essentially the same numerical procedure to compute both.


The sampling rj and its transform Rn are, in general, complex-valued and thus the DFT and IDFT relate one set of 2N numbers (the real and imaginary parts of the rj) to a second set of 2N numbers (the real and imaginary parts of the Rn). However, since r(x) is typically real-valued the rj are usually just N real numbers. In this case, the transform definitions imply Rn* = RN-n where the asterisk (*) denotes the complex conjugate. This says that of the 2N numbers forming the discrete transform only N are independent; these are of course derived from the N real rj.
Finally, we note that for purposes of solving differential equations (see below) other related transforms are often more useful. Of most utility is the discrete sine transform and its inverse defined by
and
Observe that the sine transform is its own inverse apart from the numerical pre-factor. The analogous discrete cosine transform is also valuable.



Numerical Example: For this example we first take r(x) (over the domain [0,L]) to be:
a) Sinusoid with frequency fa:
Later you can consider other forms for r(x) simply by disabling all the others using Format/ Properties/Calculation:
b) Gaussian function:
c) Unit step:
d) A function of your choice:
To perform the transforms we use a built-in Mathcad routine called FFT. Because of limitations in its underlying algorithm (described below), this routine requires that the number of points N be a power of 2 and we assume
or
so that the real-space discretization is
and the frequency-space discretization has
or
and
or
so that
The DFT is then computed by the Mathcad routine fft [ 7 ]:
For purposes of comparison we also compute an "exact" DFT by taking N (called Nx) to be large:
Shown below and on the next page are the coarse and fine-grained real-space samples (xj and xxj) and semi-log plots of the corresponding DFT amplitude spectra (|Rn| and |Rxn|).
You should play with this example, trying out different parameters (especially N and fa) and different functions. Also see how close the IDFT comes to recovering the rj.

Among the things you should observe are:

Exercise IV.C.7.b.(m) Study the more typical examples of aliasing error seen in the transforms of functions b) and c) and show that they too exhibit the reflection property mentioned above. Also investigate the aliasing error in the transform of function b) as xc is changed. Note that when xc is different from L/2, r(0) doesn't equal to r(L) and this discontinuity (in the periodically- extended r(x)) generates additional high-frequency content and hence more aliasing error.
Fast Fourier Transform

To perform a discrete Fourier transform on a computer one could obviously perform the various sums involved and arrive at the answer. In general, these sums would require on the order of N2 operations (N terms for each of N points) and, as we've seen, this is too much if the scheme is to
be worthwhile for large N. As indicated earlier, what makes finding transforms not just practical but highly efficient is the existence of an N@log2N algorithm known as the fast Fourier transform (FFT) [ 8 ]. And since the definitions of the DFT and IDFT are essentially identical, the same FFT algorithm (with minor changes) can be used for both. Below we describe briefly the essence of the FFT algorithm; more detail may be found in PRESS or BRIGHAM.

As discussed above, the N-point DFT is
The key to the FFT algorithm is the observation --- and here we go dividing and conquering again! --- that the DFT of length N can be divided into two DFTs each of length N/2 (presuming N is even) with the first covering the even-numbered points and the second the odd-numbered points, i.e.,
or
or
where W = exp(2pi/N) and the superscripts 0 and 1 indicate that these are transforms (of length N/2) over the even and odd points, respectively. Now, if N/2 is also even, then the (Rn)0 and (Rn)1 can themselves each be split in two so that the Rn are now decomposed into four sub-transforms, (Rn)00, (Rn)01, (Rn)10 and (Rn)11, each of length N/4. And if the original N is a power of 2 then log2N splittings can be performed and, in this case, at the lowest level each sub-transform is reduced to a single point! These fully split sub-transforms require no calculation --- they are just particular values of rj --- and each such sub-transform will have a unique "address" consisting of a sequence of log2N 0's and 1's. This address is very easily
found; in fact, it is readily shown to be the bit-reversed value of j in binary. The idea then is to find these N one-point transforms, combine them into N/2 two-point transforms, combine those into N/4 four-point transforms and so on until the desired one N-point transform is found. Since at each stage of this hierarchy there are N terms to compute (one for each n value) and there are log2N stages, the total operations count is N@log2N as claimed at the start.

Exercise IV.C.7.c.(e) For N = 8 write out the various sub-transforms explicitly and verify the statements concerning their addressing and the overall number of operations required.


An FFT code typically consists of two parts. The first part re-arranges the inputs (e.g., the rj's) so that they are in order of increasing j in bit reversed form. Thus, if N = 8 then the first entry in the re-ordered array is not j = 1 (100 in reversed-binary) but instead j = 8 (001 in reversed-binary). The second part of the FFT code then computes the sequence of sub-transforms (which are just combinations of nearest-neighbor entries in the bit-reversed format) and multiplies in the appropriate powers of W (computed using trigonometric identities). Rather than write our own Mathcad program implementing these ideas, we just follow standard practice and use black-box routines, specifically Mathcad's (as was in fact done earlier). The programming details can be found in PRESS or BRIGHAM.
Solving Differential Equations

Analytically, Fourier transforms can be useful for solving differential equations because in the frequency domain a linear ODE or PDE becomes an algebraic equation. In the discrete world an analogous thing happens with a system of linear difference equations decoupling into a set of independent equations each of which is trivially solved [ 9 ]. To illustrate let's begin with a one-dimensional problem, the two-point BVP rxx = b(x) on the interval x e [0,L]. As a first step we discretize the ODE assuming x = j@h, j = 0..N with X = N@h and using centered differences to obtain:
Then multiplying through by exp(2pijn/N) and summing over j gives
for
where Rn and Bn are the DFTs of rj and bj , respectively.
Next, noting that
=
(and an analogous expression for the rj-1 sum) we can re-write the transformed FDE as
And if r(x) satisfies periodic boundary conditions [ 10 ] then the FDE reduces to
which is trivial to solve for the transform solution Rn. The final step of course is to return to real space using the inverse transform to obtain the BVP solution rj.

Some BVPs do satisfy periodic BCs and then the above analysis is appropriate. However, more often the BCs are Dirichlet or Neumann conditions. For the Dirichlet case take the BCs to be r(0) = r(L) = 0. The appropriate transform is then the discrete sine transform (since the sines will vanish at the boundaries) and multiplying the FDE by sin(pjn/N) and summing over j gives
A similar set of manipulations to those used earlier converts this into
and the inverse sine transform then gives the BVP solution.

Exercise IV.C.7.d.(e) Starting from the FDE fill in the missing steps in the Fourier and sine transform derivations that lead to the solutions in the transform domain. In the latter case, how would you handle the case of inhomogeneous (non-zero) Dirichlet conditions? (If you have trouble, see below!)

Exercise IV.C.7.e.(m) Show how a two-point BVP with homogeneous Neumann BCs (rx(0) = rx(L) = 0) can similarly be treated using the discrete cosine transform.
Numerical Examples:

Two-Point BVP. We wish to solve:
with
where
and
For purposes of comparison, if b(x) equals a constant (bc) then an analytical solution exists:
Now, the transform approach. This BVP has Dirichlet BCs and so the discrete sine transform is appropriate. However, to apply it in the form described above we need homogeneous BCs; these are easily arranged by a change of variables from y to z with:
The dependent variable z(x) then satisfies
with
and
We discretize on a uniform grid
using centered differences to find the FDE
where
With the discrete sine transform one can readily show that the FDE transforms to
for
where Zn and Cn are the sine transforms of zj and cj, respectively.

Because Mathcad doesn't have a built-in discrete sine transform, to perform the sine transform we use Mathcad's built-in DFT routine (called fft) in the following way [ 11 ].
In order to find Cn we pad cj (calling the new array cpj) with an additional N elements that make it an odd function about j = N as as calculated and plotted at right.
The DFT of cpj is then [ 10 ]
or, using definition of cpj
Hence we have [ 12 ]
and from the transformed FDE
A semi-log plot of Cn and Zn is shown at right. High frequency content in these functions, generally arising from discontinuities in cj, will increase the aliasing error in the final result.
Since the inverse sine transform is identical to the sine transform (apart from a constant factor) we can again use Mathcad's FFT routine on an appropriately padded Z (called Zp):
The BVP solution is then obtained:
The result is plotted at right.

Play with this example changing N, y0, yL, a and b(x) and studying the effects of aliasing error. Note that yex is valid only when b(x) is a constant. For other cases just use a large N solution to estimate the "exact" result.
2-D Elliptic BVP: Consider a parallel plate capacitor in which the insulator between the plates contains a known fixed charge density r(x,y). Treating the electrostatics in 2-D the BVP to be solved for the electric potential f(x,y) is as depicted below.
In this problem we've assumed that the bottom plate is at zero volts, the top one is at V volts and the potential at x = 0 and x = X are those that would obtain if r(x,y) = 0. Implicit in the latter conditions is the assumption that r(x,y) is appreciable only far from the ends.

Since this BVP is an inhomogeneous Dirichlet problem, as before we define a new dependent variable
that satisfies the PDE
with homogeneous boundary conditions (i.e., y = 0 on all boundaries). For specificity we take r(x,y) to be
For the numerical solution we discretize this elliptic PDE as follows
The FDE is then
where
Based on the 1-D example, the discrete sine transform of this FDE is obvious:
where Yn,h and Wn,h are the transforms of yj,k and wj,k, respectively.

The needed 2-D discrete sine transform (and its inverse which is the same thing apart from a constant factor) is an obvious generalization of the one-dimensional formula. For example, the transform of wj,k is
Re-writing this as
shows that the 2-D transform is a 1-D transform (over j) of a sum of 1-D transforms (over k). Hence, a 1-D FFT routine can be used to solve 2-D problems very quickly. Moreover, this reduction generalizes just as easily to higher dimensions.

For purposes of simplicity, below we give a naive implementation of the 2-D transform in Mathcad with no effort made to minimize memory usage [ 13 ].
At right is a Mathcad program that implements the 1-D discrete sine transform using Mathcad's built-in DFT routine as discussed earlier. The parameter sign is used to specify whether the transform (sign=1) or its inverse (sign=-1) is to be found.
The 2-D transform is then found at right by assembling the requisite 1-D sine transforms.
The transform of wj,k is then computed by
and the solution to the BVP in the transform domain is
Finally, the solution in real space is obtained by applying the inverse transform [ 14 ]
or converting back to the electric potential
The solution to the BVP is plotted at right. Explore what happens as various parameters are changed (especially J and K).


Exercise IV.C.7.f.(h) A better set of boundary conditions for this problem would have fx(x=0) = fx(x=X) = 0. Show how one could handle this situation by using a cosine transform in the x-direction (but still using a sine transform in the y-direction). Implement this in Mathcad and compare the results.
Generalizations of the Fourier Transform Method

As seen in these examples the fast Fourier transform algorithm allows elliptic PDEs to be solved with extraordinary efficiency. From the foregoing it would also appear, however, that the method has a severely limited range of applicability (just like cyclic reduction). For example, it seems to need a regular geometry, power-of-two gridding, and a linear PDE (and BCs) with constant coefficients. Luckily, ways have been found to ease many of these restrictions, thus bringing the great power of Fourier methods to bear on a broader class of problems. Among these ways are:


Finally, we note that there are other ways the FFT can be exploited to gain efficiency, e.g.,

a) Cyclic reduction combined with the Fourier transform method. This is a fast direct method known as the FACR (for Fourier Analysis -- Cyclic Reduction) algorithm; see HOCKNEY.

b) Basis function expansion methods using trigonometric or Chebyshev basis functions. These techniques, known as spectral methods [ 9 ], are discussed in Sect. V.C .