Iterated Ritz Method for solving systems of linear algebraic equations

The paper describes research state of a new iterative method for solving systems of linear algebraic equations. The method is suitable for extremely large systems with sparse matrices. In addition to its own characteristics, it also has a feature of generality, as many iterative methods are only special cases of this approach. The algorithm was developed independently, and then implemented into the open source code program FEAP. Also, various checks were conducted, especially on practical models. Although the method has been only partially studied, good results have already been obtained.


Brief theoretical introduction
In the analysis of engineering models using numerical methods, usually system of algebraic equations (1) (often very large) has to be efficiently solved.If displacement method is considered, the system of equations is related to equilibrium conditions.In this case, K is the stiffness matrix, u is the vector of unknown displacements, and f is the external load vector.If system matrix is symmetric and positive definite, the solution is equivalent to the minimization of the quadratic form that represents potential energy of a static system: (2) The first term is the strain energy, and the second one is the work (potential) of external load.This equation is a discretized approximation of the Lagrange energy functional of the continuous (mathematical) model of a linearly elastic body.The surface of the constant energy level Π(u) = c is the ellipsoid (hyperellipsoid) that can be represented by the (3) It is a body of n -dimensional space, where n denotes the number of unknown degrees of freedom.From the geometric viewpoint, components of vector u are point space coordinates, while components of vector u 0 are ellipsoid centres.The matrix K (of the order n) is given by the product Q T DQ.The orthogonal matrix Q consists of the columns that define ellipsoid axes, i.e. eigenvectors.The elements of diagonal matrix D are (2c/λ i ) 2 , where c is the energy level, and λ i is corresponding eigenvalue.The lengths of ellipsoid semi-axes are 2c/λ i .Centre is defined at c = 0.The possibilities in plane (n = 2) and in space (n = 3) are shown on Figures 1.a and 1.b.According to the principle of minimum potential energy of a stable body, the point in space with the lowest energy level is the solution to the problem, and lies at the centre of all ellipsoids.It can be considered as degenerate ellipsoid, with the lengths of all main axes equal to zero.In numerical realization, depending on the accepted accuracy, it is however a very small ellipsoid with the energy level just a little higher than the minimum one.

Iteration idea
An approximate solution, while keeping the symbol u, can be found by successive application of the discretized Ritz method.The idea is based on the selection of linearly independent coordinate vectors ɸ i in the direction of which, with appropriate scalars a i , solution increment can be expanded [1]: In the traditional realization of the procedure, the unknowns a i and vectors ɸ i are known as Ritz coefficients and Ritz vectors.The number of vectors must satisfy the inequality 1 ≤ m ≤ n.If a rectangular matrix with m columns and n rows is defined in such a way that the columns are coordinate vectors, i.e.
and if vector a comprises scalars a = [a 1 a 2 ... a n ] T (6) the solution increment can be written as ∆u = Φ a (7) Now, in the sense of iterative process: where indices denote two consecutive iteration steps.If the energy in the i -th step is (9)

. Energy ellipse and ellipsoid
Iterated Ritz Method for solving systems of linear algebraic equations then, considering (7) and (8), in the next step it is (10) By multiplying sub-expressions in parentheses and arranging the newly created terms, we obtain: (11) where the unbalanced load, or the residual is given by Since the energy Π does not depend on the displacement increment ∆u i , and hence also not on variables a i , it can be omitted from the minimization procedure, i.e. it is sufficient to differentiate the energy increase (13) If the generalized Ritz stiffness matrix is introduced (14) which is also symmetric, and in the case of linearly independent coordinate vectors also positive definite, and if the generalized Ritz load vector is defined as (15) the energy increase can be written in an abbreviated form (16) to which after minimization (differentiation) according to , the following equation system can be related (17) In the sense of the Ritz method, this system is used to approximate the initial one (1).As the approximation is usually unsatisfactory, an iterative improvement should be made.By solving the system (17), symbolically written as (18) the coefficients a i are obtained and then an approximate displacement increment ∆u i is calculated according to (7) The new displacement is determined according to (8).The procedure terminates after the stopping criterion is satisfied.Usually, the Euclidean norm of the residual is adopted, i.e. iterative process is finished if where r 0 is the initial residual (r 0 = f if u 0 = 0), while is a very small positive number.It should be noted that the current residual can also be determined from the recursive formula obtained using expression (12) in which, according to (8), should be inserted u i = u i-1 +∆u i-1 and recognized that r i-1 = f-Ku i-1 .The method is accelerated by expression (20) as, unlike expression (12), the total displacement does not need to be calculated in each step.In fact, it is determined only at the very end, when stopping criterion is reached.However, if (20) is applied, based on equilibrium conditions (12) the residual must be updated occasionally, because of accumulation of rounding errors.Equilibrium condition is also needed in the very beginning of the method (for ), to calculate the initial residual if u 0 is not a null vector.In case of convergence problems, the maximum number of steps must be limited, in order to avoid time consuming calculation.

Overrelaxation (underrelaxation) of displacement
Motivated by the known method of successive overrelaxation, the procedure can be accelerated using the relaxation factor Ω i , by which the solution increment is multiplied.Then instead of using (8), the new displacement is defined as In fact, in most cases the local energy minimum (within the subspace used) is not globally optimal.The convergence can often be accelerated by overrelaxation (using Ω i > 1), or by underrelaxation (Ω i < 1).In both cases system (17) will not be satisfied, and the energy will not achieve a minimum within the given subspace.However, although such a step is not locally optimal, it can speed up global convergence of the method.
Unfortunately, an appropriate value of the relaxation factor is not easily selected from one step to another.It is only known that the relation 0 < Ω i < 2 is valid.Although a mathematical proof of these limits exists [2], it can be given a clear interpretation.As Π is a quadratic function, search for a minimum along a certain direction p i (Figure 2), which usually coincides with ∆u i , can be described by the following function: where Π 0 represents the energy at the starting point of the search direction (at the beginning of the step), Π min is the local minimum along this direction, and Π(Ω i ) is the value at some point of the quadratic function.Based on the condition of monotone convergence of the method Π(Ω i ) < Π 0 , the following can be written: (Ω i -1) 2 < 1 or 0 < Ω i < 2. At interval boundaries the energy remains the same as the initial one, i.e.Π(0) = Π(2), and so the method does not converge.For the values Ω i outside of boundaries, the energy increases and the procedure diverges.
Josip Dvornik, Damir Lazarević energy of a large system.The convergence is not guaranteed in the case of linearly dependent coordinate vectors (when the subspace degenerates), orthogonality of the residual on the current subspace, and the effect of rounding error at the end of iterative process (if the stopping criterion is too strict).These problems are additionally discussed in Section 7.

Basic pseudocode
A highly natural and not very complex idea behind this algorithm is easily noticed.Despite such advantages, this approach and interpretation have not been widely accepted by researchers in the field of efficient iterative methods for solving large systems [3].Theoretically, this algorithm, just like many other iterative algorithms, can be considered as a Krylov approach [4,5], and some similarities with ideas given in this paper can be found in [6].It is interesting to note that the subspace iteration, which is in essence this procedure, has quite a wide application in solving eigenvalue problems.Main elements of the described method are briefly presented by a simple pseudocode.
The efficiency of the procedure is subject of compromise.If a large number of well-chosen coordinate vectors are used (spanning the subspace in which good approximate solution lies), the energy reduction per step will be greater, less steps will be needed for finding the solution, but individual steps will last longer.A smaller number of vectors imply shorter step time, but also a less efficient subspace, and more steps to obtain the solution.An optimal approach would be to find a good-quality small sized subspace, so that even a stronger stopping criterion can be reached after a smaller number of fast steps.

Iteration by means of a small system
The procedure is started by selecting the initial approximation u 0 .Then, at each step, the set of coordinate vectors of the matrix is defined Φ i , and the vector ∆u i is determined.This increase (which is in the subspace spanned by coordinate vectors determined by the vector a i ), provides the largest reduction of energy ∆Π i within that subspace.This is why a system of linear algebraic equations (17) has to be solved, but with a small number of unknowns, equal to the number of coordinate vectors.In this way, solution of the initial system containing n equations is reduced to multiple solving of the system with m equations.In some of our examples, n was about 10 7 , while m was no more than ten.Known schemes of matrix transformations that cause formation of a small system are shown on Figure 3.The solution of this system (matrix K i is usually full), can be obtained using any direct solution method.Cholesky decomposition was used in our case.If the iterative process is convergent, the sum of small-system solutions approaches large-system solution and the sum of the small system energies increases monotonically and approaches minimum

Special cases of the method
The rectangular matrix Φ i is called the subspace matrix.As already pointed out, matrix columns are coordinate vectors ɸ 1,i to ɸ m,i that form this subspace.Depending on the choice of these vectors, many known iterative methods can be distinguished.They will however not be explained in this text [7][8][9][10][11], i.e.only the wellknown ones will be presented as special cases of this approach.
It is only necessary to use appropriate coordinate vectors.Brief descriptions given below are not intended for faster realization of these methods, compared to traditional strategies [12], i.e. they only contribute to proper understanding and emphasize the generality of this iterative algorithm.It should also be mentioned that very popular single or multiple preconditioning [13,14] can be interpreted using (one or several) coordinate vectors.For this procedure, the preconditioning does not imply any significant change of the algorithm (cf.Section 7.2.3).

The Jacobi method
The Jacobi method is obtained if only one coordinate vector ɸ m,i = D -1 r i is defined at each step.Here, D is the diagonal matrix with diagonal elements of K. Thus the matrix Φ i becomes the column matrix, i.e.Φ i = [ɸ i ], and K i and r i degenerate into scalars.Therefore, a small system ( 17) is reduced to a single equation.Solution in every step defines the displacement increment.

The methods of Gauß -Seidel and Successive overrelaxation
The procedure can be described by a sequence of cycles or "crosses" through all degrees of freedom.Then the cycle consists of steps.
Only one degree of freedom is solved at each step (using node numbering sequence).The process is often called relaxation, and contains the matrix Φ i which in every step has one coordinate vector equal to the orth, i.e. ɸ i = e i .The component corresponding to the degree of freedom that is currently relaxed is equal to one, while all others are equal to zero.Matrix K i and vector r i degenerate into scalar, therefore, only one equation needs to be solved.The equilibrium of only the corresponding degree of freedom is satisfied in this way, as the residual of the previous equation is disturbed at the same time.
During the convergent iterative process, the disturbance gradually decreases.The cycle ends when all equations (steps) are solved, and only the last one is in equilibrium.This is followed by the start of a new cycle, with the same coordinate vectors.With the steps progress, the component equal to one moves from the first to the last vector component again.
If the matrix approach is applied (by cycles that correspond to steps in other methods), and if the solution is sought using the node numbering order, each cycle has one coordinate vector ɸ i = L -1 r i , where L is the lower triangular matrix (contains the lower triangular part and diagonal elements of K).In this interpretation, the method of successive overrelaxation is described by the same matrix, but the diagonal elements are multiplied by the local relaxation factor ω, which should be distinguished from the global .Then, even the equilibrium of the degree of freedom just solved is no longer valid (In the Gauß -Seidel method Ω = ω =1).
If the solution is sought opposite to the node numbering, then the lower triangular matrix should be replaced with the upper one, i.e. ɸ i = U -1 r i .There are also other ways of relaxation, for instance, there is the known "chessboard scheme" on the rectangular mesh.In the first half of the cycle only "black" nodes are solved, while "white" ones are solved in the second half of the cycle.Relaxation is also possible by numbering from right to left (instead of using the traditional left to right).If diagonal connections between nodes are used, the number of possible relaxation paths will increase.New access possibilities are obtained if, in addition, the columns and rows interchange places.Further possibilities exist in the three dimensional rectangular meshes, especially with diagonal connections between nodes, both, in coordinate planes and along space diagonals.Irregular meshes from the finite element method are even more complex, and have a huge number of meaningful connections between nodes, and hence numerous ways of load (residual) propagation.An interesting idea is to go from the highest (by absolute value) to the lowest residual of the current cycle.However, reordering is needed after solution of each equation (after every step), due to the change of residual in the current node and its topological neighbours.Of course, continuous correction (sorting) of residual requires additional time, which affects the numerical efficiency.
In principle, the algorithm redefined in this way needs a smaller number of steps to reach the solution.In fact, known methods of Cross and Werner-Csonka, converge better if started from the node with the largest residual established.Then, residual vector has to be updated and the node with the largest residual has to be equilibrated again.The procedure no longer depends on nodes numbering, but rather on the distribution of residual (load).That is why the notion of cycle loses its real meaning as, before some nodes are equilibrated for the first time, the other ones have already been solved several times.

Method of steepest descent
In this method also, only one coordinate vector equal to residual vector is used in each step.Thus, ɸ i = r i .It is also a gradient of the energy functional with a negative sign, usually used for traditional realization of the procedure, i.e. r i = -∇Π(u i ).If matrix K is symmetric and positive definite, the method is convergent, but the convergence is usually slow, especially in the case of a poorly conditioned matrix.Although this method is not efficient, it is used as a pedagogical introduction and motivation for improvement of other iterative methods.

Conjugate gradient method
In the first step there is only one coordinate vector, the residual, as in the method of steepest descent.After that, the vector of previous solution increment is added, which accelerates the convergence.That is why this method is much faster than the previous one.It can be written: ɸ 1,i = r i and ɸ 2,i = ∆u i-1 .Thus, the small matrix is of the order two, i.e.Φ i = [ɸ 1,i ɸ 2,i ], and a system with Josip Dvornik, Damir Lazarević two unknowns should be solved at each step.This description differs from the usual ones where the K-orthogonalization of vectors is used.However, this orthogonalization equal to solving of the abovementioned system with two unknowns.The orthogonality property of these two vectors is gradually lost due to accumulating of round-off errors.Nevertheless, even the approximate orthogonality from adjacent steps greatly speeds up the convergence, compared to the steepest descent method.This theory has been confirmed by numerical tests on relatively small systems.The residual norm suddenly drops to zero (or more precisely to the numerical approximation of zero) exactlyin the nth step.On the other hand, the tests conducted on large systems show smooth behaviour in the n -th step, when drop could theoretically be expected.That is why the method needs stopping criterion.In other words, it behaves like usual iterative method.
Although the method is relatively fast, and in the case of symmetric and positive definite matrix has a practical use, the convergence in not always satisfying, especially if the matrix is ill conditioned.That is why many preconditioning techniques are often used, and the preconditioned conjugated gradient method is frequently adopted.Then the procedure can terminate (satisfy the stopping criterion) after a much smaller number of steps compared to the theoretically required .Let's close this section with a somewhat optimistic citation [4]: "The choice of the [iterative] method is a delicate problem.If the [system] matrix A is symmetric and positive definite, then the choice is easy: Conjugate Gradients." In accordance with our method, that means that there is actually no better subspace of coordinate vectors than the plane?It seems that there should be enough room for improvement.

On the selection of an efficient subspace of coordinate vectors
We would like to define the matrix Φ i so that the method converges much faster compared to traditional procedures.Although we wish to keep the number of coordinate vectors small, we think that only one vector (such as in Jacobi, Gauß -Seidel, successive overrelaxation and steepest descent), or two (in the case of conjugate gradients), are less than optimum.Obviously, as the number of vectors is much smaller than , the solution increment ∆u i can only accidentally hit the solution u in the early phase of the calculation.We can imagine that, in addition to (two) coordinate vectors of the conjugate gradient method, a third vector is introduced.In such a case, the subspace is extended to three vectors.Compared to the subspace that has two vectors only, a greater energy reduction per step can reasonably be expected.The contribution of this additional vector can, in the worst case, be equal to zero.The following conclusion is also possible: the minimum of the energy functional in a larger subspace cannot be greater (at a higher level) than the minimum in a smaller subspace.In that sense, it is possible to add the fourth, fifth and additional vectors and even greater reduction of energy per step can be expected.In this way, most of the energy from the static system should be exhausted in several steps and the system can be "damped out" to the lowest point -the solution of the problem.The idea of expanding the subspace is obviously quite attractive, but only up to a certain point.On the one hand, formation of vectors must not be time consuming process.On the other hand, if number of vectors is excessively increased, small system (to be solved in each step) could be unacceptably large.Ultimately, if the subspace dimensions are equal to the number of unknowns, the total energy can be minimized and the solution obtained in the first step (or maybe in the second step due to rounding errors), but at a "price" equal or greater than needed for finding direct solution of the problem.

Necessary conditions of the selection
A small matrix can be singular (or almost singular -ill conditioned) if some coordinate vectors are exactly (or almost exactly) linearly dependent.It should be recalled that vectors ɸ i are linearly independent if the expression is valid only in the case of all a i = 0.In numerical realization, this condition should be even stronger: vectors should not be even "almost" linearly dependent.Then, the Euclidian norm of linear combination of vectors can be smaller than a small positive number δ, i.e.
< δ (23) only if all coefficients are |a i | < δ.The violation of condition (22), and especially (23) is not automatically prevented by the various strategies used to generate coordinate vectors.(as discussed below).The vectors that do not fulfil these conditions should be discarded.Although this reduces subspace dimension, the small matrix becomes regular and better conditioned.If more than two vectors are linearly dependent, it is not clearly defined which of them should be rejected.Then the exact or approximate orthogonalization of such vectors should be considered.In any case, the situation in which vectors are generated and then rejected, orthogonalized, or maybe even replaced by others, makes step more "expensive", and therefore should only occasionally happen.Obviously, procedure in which it is not possible (or it is rarely possible) to generate dependent vectors should be considered.For easier realization of such procedure the generation method can even be changed during the calculation.If the dependence nevertheless happens, there is a "last moment solution" as during decomposition some pivots of the matrix K i become equal (close) to zero.This can be recognized and used for discarding the corresponding equations from calculation of the small system.This subspace reduction has proven to be a fast and simple solution of linear dependence problems.However, formation of the subspace is one thing, but its quality is something completely different.For instance, if coordinate vectors Iterated Ritz Method for solving systems of linear algebraic equations It can be observed that coefficients (to which a greater is attributed) are multiplied by a larger eigenvalue and thus contribute more to (the increase of) stiffness k i,i .Let's mention, very rough vectors also have a great squared norm of residual ║r║ 2 2 .With the progress of the steps, smooth coordinate vectors effectively reduce contribution of low eigenvectors.After that, the procedure behaves like influence of these vectors does not exist.Thus, the condition number of the system matrix becomes smaller, and the speed of convergence increases.If additional similar vectors are added, the energy reduction should be greater.From the theoretical viewpoint, the "smoothness" property is not necessary, but it is included in this section, because it is important for an efficient realization of the method.Without it, the method is neither efficient nor competitive.Although the above limitations reduce the possible choices, the set from which appropriate coordinate vectors can be selected still remains very large.Unfortunately, we are not (and as far as we know nobody is) familiar with good criteria for the selection of generally efficient vectors.In addition, the background theory that would make selection easier is also insufficiently known.The problem is that many possibilities arise and we can be only satisfied with the implementation and comparison of numerical tests on numerous examples.As a rule, a specific set of vectors works fine for some models, while it is quite bad for the others.In such circumstances, we would be satisfied with the selection of appropriate vectors, and finding the best set (several fast vectors not dependent on models) would be a great research result.

Several proposals for generation
There are two basic approaches to the generation of coordinate vectors: general and special.In the first approach, no additional data about the model are needed.The system matrix and the right hand side vector are sufficient.If they are properly defined, convergence of the method is satisfying in most practical cases.In the second approach some special features, valid only for the model that is currently considered, are exclusively used.Then, an excellent convergence is expected but only for this specific model (or perhaps for a small group of similar models).Some strategies for coordinate vectors generation are shown below, primarily according to the first and then to the second approach.

Selection of constant vectors
The simplest approach is to select a group of linearly independent vectors in advance.Such is the case in the methods of Gauß -Seidel or successive overrelaxation.The number of vectors coincides with the number of unknowns, and their sequence is cyclically repeated until convergence criterion is reached.This was considered in Section 6.

Selection based on current residual
An appropriate set of vectors can be defined using a current residual r i .In fact, as the "expensive" calculation of an initial error (we borrowed the symbol for the orth) immediately leads to the are perpendicular to the residual vector, the subspace is not useful.Then the right hand side of the small system (Ritz load vector) is 0. This results in a i = 0 and ∆Π i = 0, and so the energy is not reduced.In other words, the procedure does not converge.To avoid this problem, the norm ║ɸ i r i ║ 2 / ║r i ║ 2 T should be greater than a small constant.To achieve this, it is sufficient to have one coordinate vector ɸ i that is not orthogonal to residual r i .For example, it can be determined by multiplying the selected positive definite matrix P (of order n) by vector of the residual: ɸ i = Pr i .Then, according to the definition of positive definiteness , unless r i is a null -vector, but this means that the solution has been achieved.To further explain this situation, something else should also be noted: If old coordinate vectors Φ i (from the previous step) are kept during the calculation of new residual r i+1 , then the unfavourable case (mentioned above) would be obtained, as the new vector of the residual is always orthogonal to the old subspace, i.e. the following is valid: Ω = 1 The statement is not valid in case of overrelaxation or underrelaxation of displacement, but only if . Using symbols and relations introduced during description of the method, a simple proof can be as follows: (24) Compared to the Ritz coordinate functions on the continuum, compatibility conditions and geometrical boundary conditions in this discrete alternative are not sought.If only the system of equations is known, rather than the static system from which it was generated, such properties are not even defined.It is only known that they are contained in the system matrix.However, a coordinate vector can intuitively be considered "smoother" if it contains a smaller relative contribution of "high modes" -eigenvectors of the matrix K with high eigenvalues.Such a vector forms more realistic coefficients of the Ritz matrix, because corresponding residual also has a small portion of high modes.The diagonal element of a small matrix related to such coordinate vector and the corresponding "generalized stiffness" are smaller.On the contrary, an excessively "rough" vector generates large stiffness in the Ritz matrix and causes locking of residual.This can easily be proven by expanding the coordinate vector ɸ i in the base of matrix eigenvectors: (25) In case of normalized vectors ( ), the corresponding diagonal element of the small matrix (briefly marked as system matrix) can be written as (26) Josip Dvornik, Damir Lazarević correct solution, we can try to find the matrix e 0 = K -1 r 0 , which correctly approximates P with the smallest possible amount of calculation K -1 .A good choice is to use positive definite matrix, but it is not absolutely necessary.It can be nonsymmetric and ill conditioned, and even singular, as all this does not imply singularity of the matrix K i .Of course, we must not generate all coordinate vectors with matrices of the same singularity.As already pointed out, the coordinate vector is generated as P r i .There is also an additional advantage in the vector of current residual, as it ensures non-orthogonality of coordinate subspace to this vector, which is important for convergence of the method.For instance, if K is approximated by identity matrix (P = I) we obtain ɸ i = r i , which is really the steepest descent method.The Jacobi iteration is based on the slightly better approximation by diagonal matrix (P = D -1 ).As already pointed out, then ɸ i = D -1 r i .These procedures are not efficient, as the ("cheap") replacement matrix P contains insufficient data about inverse of K.The following should be emphasized: if several coordinate vectors are used, it is not necessary that an individual vector approximates K -1 r i well, but rather that the subspace spanned by all vectors contains the best possible approximation of this product.More appropriate coordinate vectors can be generated using one (or several) cycles of the Gauß -Seidel method or the method of successive overrelaxation.In such cases it would be useful to try various ways of "visiting" the nodes.However, unlike the classical realization, in order to save a computer time, an incomplete procedure should be used: return to the already relaxed nodes should be prohibited.The idea is good, but increases "the price" of the cycle.Perhaps, the nodes could be sorted in the first cycle (according to absolute values of residual components), while keeping or rarely correcting the order afterwards.Between cycles (and maybe between steps) it is desirable to introduce the local relaxation factor , but in this case an optimum value should be researched.Let us mark with L ω and U ω the lower and upper triangular matrix of K, whose diagonal elements are multiplied by ω.These matrices can be simply and quickly inverted and multiplied by vector.The coordinate vector determined by one cycle of the successive overrelaxation procedure, using the order of unknowns numbering (from first to last), can be presented as ɸ i = L ω r i .Using the same procedure, but in the opposite order, ɸ i = U ω r i -1 is obtained.In both cases, the cycle starts with the null-vector.The coordinate vector can be generated by multiple use of the sum or product of these two approaches.Thus we have: In the first approach, the matrix K can also be placed between the parentheses.The matrix P can easily be recognized in equations above.If such vectors are independently used, a smaller relaxation factor (close to one) makes the convergence slower, but guarantees faster process of "smoothing".It would therefore be efficient to use them as an addition to other coordinate vectors.The increase of the cycles number (successive changes of matrices and ), results in the greater smoothness of the vector (the contribution of higher eigenvectors decreases, and contribution to convergence in the region of lower eigenvalues increases).In this way, it is also possible to generate coordinate vectors using other iterative methods.It is even possible to use algorithms that are neither convergent nor numerically stable.Thus, the local relaxation factor does not need to lie within theoretically determined limits of the global factor (which has to be between 0 and 2).Interesting coordinate vectors can be generated using the smoothening of the residual vector.In this case, we are talking about "filtering".A component of such vector is equal to the sum of residual components in the neighbouring nodes, multiplied by the weighting factors.This approach was found to be efficient in some examples with large jumps of residual function.These jumps occur due to bad prediction of displacement in some steps, which often appears in the region of supports and free boundaries of the model.Generally, if a residual is decomposed into eigenvectors of the matrix K, then two or three coordinate vectors that smoothen the lower part of the residual spectrum (low eigenvalues) should be formed, and additional two specialized for the upper part of the spectrum should be added.The previous displacement increment ∆u i-1 should also be added, which is the origin of success of the conjugate gradient method.In this paper, coordinate vectors were generated using the symmetric successive overrelaxation procedure.Thus, the first vector was defined as The previous displacement increment was also added.Calculations were made using different number of coordinate vectors (Section 9).Somewhat greater local coefficient of relaxation ω = 1,65 was selected.The global one was kept to Ω = 1.Let us now explain the product in parentheses.As one step in the direction of vector ɸ 1 gives current displacement increment αɸ 1 , where α is a number, the residual r i -αKɸ 1 is obtained according to (20).If the symmetric overrelaxation is applied to this residual, because of (29), the second vector becomes Vector ɸ 1 already participates in the formation of the subspace, ɸ 1 and α influences only the length of the new vector (it does not change the subspace that this vector expands), therefore and can be dropped.Thus, the form (30) is obtained.An interesting, faster realization of this procedure, could be if K is used instead of D. These considerations are also of general significance.In this way, coordinate vectors can be generated using any iterative method, i.e. not only the symmetric relaxation.Different methods can also be used for every vector as well.For instance, the incomplete Cholesky factorization can be used for the first vector.Then, forward and backward successive overrelaxations are used for the second and third vectors.The Iterated Ritz Method for solving systems of linear algebraic equations symmetric version of these methods can be used for the fourth and fifth vector, etc. Similarly to the described effect of one vector, here the subspace is smoothened by all vectors obtained by relaxation.Residual is smoother with an increase of the vectors used (or cycles in the formation of one vector), which contributes to faster convergence.A similar effect is obtained by the vector from the Jacobi method, with components r i /k i,i .

Generation according to preconditioning
Various ideas used for traditional preconditioning of equation systems can be applied for generation of coordinate vectors.In addition to the matrices from the iterative procedures given above, the incomplete Cholesky factorization or the matrix polynomial of K are also used for the preconditioning matrix, let's mark it as M It is used for reduction of the system matrix condition number.Symbolically, instead of solving K u= f we indirectly solve but the matrix M should be quite rapidly inverted.From our standpoint, M -1 is nothing else but P, and the coordinate vector is once again Pr i .Interestingly, some of our examples converged better by coordinate vectors generated using backward node numbering.However, in literature it is quite usual to use preconditioning procedure by forward numbering only.It would therefore be worthwhile to implement preconditioning technique using backward procedure.In order to generate several coordinate vectors, two or more preconditioning methods (of matrices P) can be used at the same time.This is analogous to multiple preconditioning.Then the efficiency of the step can increase.Consequently, the matrix transformations used for preconditioning are not necessary, i.e. in the sense of our method, the preconditioning is just the way of forming coordinate vectors.During generation process, they can become (either exactly or approximately) linearly dependent and one or several vectors must be excluded.

Selection based on previous displacement increment
An appriopriate set of coordinate vectors is based on history recycling, i.e. on the use of previous displacement increment ∆u i-1 .
It is known that this vector significantly improves convergence of conjugate gradients with respect to the steepest descent, and it also enables use of relaxation factor.Therefore, it can increase efficiency of the method.In our interpretation, one of subspace vectors is ɸ i = ∆u i-1 and it has a similar effect, although (compared to the conjugate gradient method) the recursive orthogonality of the previous increments is lost.In our procedure (unlike many other procedures), there is only orthogonality between successive (not distant) solution increments, residuals or subspaces, as the orthogonalization makes the step "more expensive".If , successive orthogonality is also lost.That is why, during calculation, a small system can become ill conditioned and the convergence slower.The coordinate vectors could possibly be orthogonalized after a certain number of steps, but we are not inclined to do it.Additional vectors can be generated by multiplying the last displacement increment by some matrix, i.e. ɸ i = S∆u i-1 .The adding of earlier increments ∆u i-2 , ∆u i-3 and so on, was not sufficiently effective.Besides the last displacement increment, the current solution vector u i can similarly be used as one of coordinate vectors.The "recycling" of this vector makes sense due to the loss of orthogonality, and it could be efficiently applied to nonlinear systems without such a property.Finally, let's mention that this group of vectors is not used independently.

Selection based on data about the model
Another strategy for generating coordinate vectors is to use some specific data about the model that is being solved.The approximate geometry and simplified model properties are frequently used.Often, it is sufficient to use only a problem that is in some way similar.For example, less important degrees of freedom can be excluded; the same geometry and element mesh can be used, but with simpler distribution of stiffness; hierarchical behaviour of a complex model can be used (as in manual calculation); coarse finite element mesh can be applied, etc.Generally, the displacements of these models under residual load can be used as appropriate coordinate vectors.This is most often associated with crude vectors that are most effective in the beginning of calculation.Later on, the solution needs to be smoothened, and it is better to use some of the more accurate approaches, described earlier.Nevertheless, such vectors can ensure very fast solution of many concrete problems, but the generation process has additional difficulty: lack of generality.Each type of equation requires separate approach.For example, the substitute model for a thick beam can be a traditional thin beam, while a membrane could be used instead of a shell.If the solution of the substitute model u z , is known, then the coordinate vector is ɸ = Nu z , where N is the matrix of interpolation functions that connect degrees of freedom of default and substitute models.In the case of a beam, the substitute model is based on line elements (and can be simply supported), and the default model is defined by the mesh of planar finite elements (Figure 4).The nodes of both models that lie on the axis are connected by third-degree polynomials, and those of the default model Josip Dvornik, Damir Lazarević (lying outside of the axis) are tied to the substitute model using cross section hypothesis of thin beam element.Columns of the matrix N are formed via polynomials and the hypothesis.The matrix defined in this way is singular, as displacements between models are linearly dependent, but the coordinate vector is correct and with the progress of the procedure leads toward solution for a thin beam.Of course, this solution is not good enough for original model.That is why a residual vector (or some other "correction") must be used as an additional coordinate vector to correct the assumption of straight cross sections, and provide a good solution of a high beam.
The iterative solution procedure based on the model stiffness hierarchy can also be used for formation of a coordinate vector.The coordinate vector can be defined as a solution after only several iterations between parts of the model with different stiffness.Obviously, this would only be a very crude approximation of the final result (which would be obtained after a larger number of steps), but it is nevertheless quite acceptable for fast definition of a coordinate vector.
Vectors can be generated with crude finite element mesh, using several iterations of a multigrid method [15,16], but also through direct use of the analytically defined (continuous) coordinate functions over approximate domain of the model, which meet geometric boundary conditions.Notable examples of these functions are the Ritz functions and, as an interesting extension, R-functions [17].Both can be multiplied by polynomials.Even more freely selected smooth functions that do not satisfy natural and geometric boundary conditions (here polynomials can once again be mentioned) can be used as coordinate vectors.
Openings and similar irregularities do not need to be taken into account.In these cases, components of coordinate vectors are equal to the values of functions in the nodes.Obviously, such vectors are "rough", and the elements of the corresponding Ritz matrix are "excessively stiff", but can be used in the first steps of the procedure.However, if they are kept, the solution will be smoothened at later stages of the procedure, but the convergence will not be impressive.It would be better to somehow smooth these vectors in advance, possibly, for each one to use two cycles of overrelaxation with small , first with increasing and second with decreasing node numbering sequence.
Coordinate vectors can also be generated by means of analogy.
For instance, if we are solving a slab problem, then we can use the solution to the problem of magnetic or electric field, torsion, membrane, grid, etc.The solution for the same slab with different load can also be applied, or even a roughly sketched displacement field.Then, measuring from the picture, we can determine displacements as vector components.All such solutions can be columns of the matrix ɸ i .Coordinate vectors can be used for an efficient definition of kinematic constraints.The hypothesis of straight cross sections of beam elements is one of such constraints.If we write them in the form of Tu = f where T is the constraint matrix, then the initial approximation of the solution must satisfy nonhomogeneous constraints, i.e.Tu 0 = f, while coordinate vectors must satisfy homogeneous constraints only, i.e.TΦ i = 0.
Let us finally mention the mixed approach to the generation of coordinate vectors.Thus the displacement increment ∆u i or the current approximation u i is multiplied by some functions of coordinates.They can be Ritz functions or R-functions, and even polynomials, as mentioned above.

Coordinate vectors as input data
Finally, if someone finds a better set of (one or several) coordinate vectors, the matrix ɸ i could be formed without difficulties, and described procedure easily used.In this way, the proposed algorithm can be considered as a general approach to the iterative solution methods, where coordinate vectors are set as input data (in addition to the necessary ones required by all iterative methods).

Briefly about realization
The proposed pseudocode was realised using the programming language gfortran [18].64 bit Ubuntu version 5.3.1 and OS X version 6.1.0were used.Once the program was verified on small equation systems, stiffness matrices and loads were generated for a considerable number of planar and space trusses.The formation of model was carried out in two ways.On the one hand, elements were placed in a traditional way, so that the trusses can form a good static system.On the other hand, to make condition number of the corresponding system much larger, we irregularly connected distant nodes by truss elements with large differences in stiffness values.
In this way, we formed illogical trusses that cannot be regarded as structures.Thus, from the numerical viewpoint, we tested the program on both good and bad models.System matrix is saved sparsely, using full bookkeeping method by columns, and we also tried to do it by rows [19].A very similar strategy also exists in the open source code program for the finite element method FEAP [20].Josip Dvornik, Damir Lazarević So, it was possible to easily connect our code with this package.The version 8.4.1 was used [21].The compilation with the FEAP was also realized using the gfortran language.numeričkogagledišta, dobrim i lošim modelima.Upotrijebili smo zapis matrice u obliku potpunoga knjiženja (engl.full bookkeeping) po stupcima, a probali smo i po retcima [19].Vrlo slična inačica zapisa postoji i u programu otvorena koda za realizaciju metode konačnih elemenata FEAP [20].Time smo odmah vodili računa o spajanju s tim programskim paketom.Upotrijebili smo inačicu 8.4.1 [21].Spajanje, programsko prevođenje i povezivanje s FEAP-om također su realizirani jezikom gfortran.

Results of practical models
After fundamental checks, we analysed several models from structural engineering practice, on which we worked in previous years (Figures 5 to 7).For clarity reasons, the loads and supports are omitted from the figures.Models contain various types and shapes of finite elements.The unknowns are primarily displacements, although somewhere rotations are also present.Basic data about the models are given in Table 3.The fill rate of matrices is defined as the ratio between the number of elements saved and the number of all elements of the matrix.Various checks of iterative algorithms are necessary as their dependence on the type of problem is fairly known.In other words, they can be adjusted for very fast calculation of typical examples, with the data known in advance, used to set key parameters of the method.However, the efficiency decreases as soon as the problem deviates from the expected. .Red and purple colours denote the area of the smallest and the greatest displacement, respectively.Calculations were made based on the iterated Ritz procedure (IRP) with two, four, six and ten coordinate vectors (argument of IRP), and using the method of conjugate gradients without preconditioning (KG) and with diagonal preconditioning (KGD).The following can be observed in figures b) and c): the number of iterations decrease with an increase in subspace, i.e. the reduction of residual and energy norm per step is larger.Even for two coordinate vectors, the method converges faster than KG and KGD (which can also be interpreted with two vectors).However, it has to be mentioned that there is a better preconditioning technique for the conjugate gradient method, e.g. by incomplete Cholesky factorization, although the results show that there is an adequate reserve for greater number of coordinate vectors (which has to be additionally checked).This is also confirmed by Table 3, with the number of steps needed for reducing the residual ratio to 10 -8 .It should also be emphasized that the results from all examples are in good agreement with solutions obtained by direct and iterative methods used inside FEAP.

Conclusion
According to the described properties and our experience, the iterated Ritz method can be advantageous when solving large linear systems with sparse matrices.In some of our examples, as many as 10 -7 unknowns were used, with the fill rate of approximately 10 -6 .The explanation of the procedure is close to the engineering way of thinking, i.e. to the Ritz energy interpretation, unlike for instance the method of conjugate gradient that is normally explained geometrically, in the abstract -dimensional space.This procedure should not be worse than the conjugate gradient method (with and without preconditioning).In fact, a good subspace extension does not result in worse convergence.Additionally, several strategies for generating coordinate vectors can be used (as if several iterative methods are simultaneously applied).In case of a selection of appropriate vectors, the convergence is much faster compared to convergence based on an individual procedure.The preconditioning that transforms original system is not necessary here, but the algorithms developed for preconditioning can successfully be used in the generation of vectors.The restart known from the traditional method of conjugate gradients, by which the procedure is restarted due to loss of orthogonality, cannot be justified in this case.Only the displacement increment from the previous step is used, which improves the solution in the current step.As the orthogonality between iteration values is not required, additional advantages could be expected in the nonlinear problems.In such problems orthogonality properties,

Figure 2 .
Figure 2. Dependence of energy on relaxation factor

Figure 3 .
Figure 3. Formation of a small system: a) matrix , b) vector

Figure 4 .
Figure 4. Model of a clamped beam

Figure 5 .Figure 6 .Figure 7 .
Figure 5. Cube model (left) and HPP Rama powerhouse model (right; only a half of the model is shown) [22]: a) distribution of vertical displacements; b) decrease of residual; c) decrease of energy Figures show distribution of vertical displacements of the models [under a)], and dependence of the logarithm of the residual ||r i || 2 /||r 0 || 2 and energy ratio (Π 0 -∑ i ∆Π i )/Π 0 on the number of iterations [under b) and c)]

system matrix 8. formation of a "small" right hand side vector 9. solution of a "small" system 10. ∆u i ← Φ i a i determination of an solution increment 11. u i+1 ← u i + ∆u i calculation of a new displacement 12. r i+1 ← f -Ku i+1 new residual 13. i ← i + 1 increase of the step counter 14. until ║r
i ║ 2 ≤ e║r 0 ║ 2

Table 2 . Basic data about numerical models
Iterated Ritz Method for solving systems of linear algebraic equations