next up previous
Next: SIMULATION RESULTS Up: Adaptive Equalization of Non-linear Previous: COMPLEX RADIAL BASIS FUNCTION

Stochastic Gradient (SG) algorithm for CRBFN's

Let $f({\bf z}_n)$ 2 denote the CRBFN output corresponding to input ${\bf z}_n$ at time n as given by

\begin{displaymath}f({\bf z}_n)=\sum_{j=1}^{M}w_{j,n}\phi(\vert\vert{\bf z}_n - {\bf
c}_{j,n}\vert\vert;\sigma_{j,n}).
\end{displaymath} (3)

Here the subscript n denotes the time index, wj,n denotes the complex valued CRBFN weight at time n, $\phi(.;\sigma)$ is the basis function parameterized by $\sigma$. Also let dn and $e_n=f({\bf z}_n -d_n$) denote the complex valued desired response and error at time n respectively. The training set consists of a number of pairs of inputs and desired response $({\bf z}_n,d_n)$. The SG algorithm takes the instantaneous gradient of the squared error ||en||2 and moves the network parameters in the opposite direction of their respective gradients. Thus a network parameter $\rho$ of the network (which can be a weight, a center, or a spread parameter) is adapted at time n according to

\begin{displaymath}\rho_{n+1}=\rho_n-\mu_\rho \frac{\partial\vert\vert e_n\vert\vert^2}{\partial \rho_n}
\end{displaymath} (4)

where $\mu_\rho$ controls the speed of adaptation.

The SG algorithm does not guarantee convergence to globally optimal network parameters. However , it does appear to converge to a reasonable solutions in practice. The method can be used as a single-stage learning algorithm if training data are only sequentially available or as the second-stage method of a two-stage algorithm where centers and spread parameters are predetermined by a method such as OLS or a clustering technique. For the single pass case, centers can be initialized by, for example, forming them with the first few training samples.

The SG algorithm has certain advantages over existing methods :

1.
All free network parameters are adapted simultaneously, usually yielding improved overall solutions. The method can also provide greater robustness to poor initial choices of parameters, especially the centers.
2.
The algorithm is well suited for on-line adaptive signal processing unlike block processing algorithms such as the Moody-Darken or the OLS algorithms. It is also computationally quite feasible.

There are variety of basis functions used in RBF's but we have selected specifically the Gaussian basis functions as they are the most popular and widely used. Another argument in favour of them is that the noise which lies around the mean of the desired cluster is generally Gaussian. The network with M basis functions gives the output $f({\bf z}_n)$ at time n as

 \begin{displaymath}f({\bf z}_n)=\sum_{j=1}^Mw_{j,n}exp(-\vert\vert{\bf z}_n-{\bf
c}_{j,n}\vert\vert^2/\sigma_{j,n}^2).
\end{displaymath} (5)

Since there are always some outliers, which will lie far away from the signal constellation we had proposed to normalize (5) [11]. So the equation gets modified as

 \begin{displaymath}f({\bf z}_n)=\frac{\sum_{j=1}^Mw_{j,n}exp(-\vert\vert{\bf z}_...
...vert\vert{\bf
z}_n-{\bf c}_{j,n}\vert\vert^2/\sigma_{j,n}^2)}.
\end{displaymath} (6)

with $\phi_j({\bf z}_n)=exp(-\vert\vert{\bf z}_n-{\bf c}_{j,n}\vert\vert^2/\sigma_{j,n}^2)
$, the SG algorithm adapts the network parameters according to (7)-(9), as shown below.

 \begin{displaymath}w_{i,n+1}=w_{i,n}+\mu_we_n\phi_i({\bf z}_n)
\end{displaymath} (7)


 \begin{displaymath}\sigma_{i,n+1}=\sigma_{i,n}+\mu_\sigma\phi_i({\bf
z}_n)[w_{R_...
...\vert\vert{\bf z}_n-{\bf
c}_{i,n}\vert\vert^2}{\sigma_{i,n}^3}
\end{displaymath} (8)


 \begin{displaymath}{\bf c}_{i,n+1}={\bf c}_{i,n}+\mu_c\phi_i({\bf z}_n)\left[ \f...
...n}}Im\{e_n\}Im\{{\bf z_n-c}_{i,n}\}}
{\sigma_{i,n}^2} \right]
\end{displaymath} (9)

Since Gaussians are fast-decaying functions, it can be assumed that not all basis function units contribute significantly to the network output. Hence, instead of training all the hidden nodes, one could train only a selected number of basis function nodes with the largest output values. When the input dimension is high, it is the adaptation of the centers that requires the most computation. Computation can be significantly reduced if, for example, at each training iteration only one center is adaptively moved while the weights and the $\sigma$ parameters are adapted for all nodes.


next up previous
Next: SIMULATION RESULTS Up: Adaptive Equalization of Non-linear Previous: COMPLEX RADIAL BASIS FUNCTION
Temp DNS admin
1999-02-03
Hosted by www.Geocities.ws

1