Books     Neural Networks
Neuron Models

Activation Functions

Activation functions play a crucial role in characterising a neuron's behaviour. Whilst a step function may be used to construct universal networks at the theoretical level, it leads to bad performance or complicates calculations. In this section we will briefly look at different types of activation functions.

Signum function

Sometimes it is desirable that neuron's output be 1-1 and +1+1 instead of 00 and 11 – the two-way signum function is defined as

f(u)=sgn(x)={1u<θ1uθf(u) = sgn(x) = \left{ \begin{array}{ll} -1 & \quad u < \theta \ 1 & \quad u \geq \theta \end{array} \right.

This function is helpful because it amplifies differences between input so that classification, for example, may be attained more quickly in some situations. Mathematically, this is a step function not the signum function, which is a three-way function defined as

f(u)=sgn(u)={1uθθu=θ1u>θf(u) = sgn(u) = \left { \begin{array}{lll} -1 & \quad u \le \theta \ \theta & \quad u = \theta \ 1 & \quad u > \theta \end{array} \right.

Linear function

The third function we consider is the linear function

f(u)=u\begin{equation} f(u) = u \end{equation}

Being the simplest function possible does not make it useless, linear regression can be implemented using a very simple network with neurons having this activation function as we will see later in this chapter.

Piecewise linear function

A piecewise linear function may be used as an activation function and may be defined as follows

f(u)={0u1/2u+1/2>u>1/21u1/2(1.8)f(u) = \left { \begin{array}{lll} 0 & \quad u \leq -1/2 \ u & \quad +1/2 > u > -1/2 \ 1 & \quad u \geq 1/2 \end{array} \right. \quad \quad (1.8)

which can be regarded as an approximation to a nonlinear function. +1/2+1/2 and 1/2-1/2 can be replaced with appropriate θ\theta. Some interesting properties of this function are:

  1. A linear combiner arises if the linear region is maintained without running into saturation.
  2. It reduces to a threshold function if the amplification factor of the linear region is made infinitely large.

Non-binary signals

For biological neurons the two binary values represent action-potential voltage and axon membrane resting potential. When applied to artificial neurons, these values are often labelled as 11 and 00, respectively. In biological neurons, it is accepted that information is encoded in terms of the frequency of firing rather than merely the presence or absence of a pulse.

This situation may be modelled in two ways: 1) the input signal could range in the set of positive real numbers and 2) we could use a binary pulse stream as signals encoded as the frequency of occurrence of 11's.

Continuous signals

The continuous signals work fine as inputs but they are suppressed by the step function. This could be overcome by the use of a squashing function. A very common family of functions to use is the sigmoid functions family. The sigmoid has the effect of softening the step of function. In other words, it could be regarded as the fuzzified values of the crisp value of the step function.

One convenient form of this function is

y=σ(u)11+e u/ρy = \sigma(u) \equiv \frac {1} {1 + e \space ^{-u / \rho}}

ρ\rho determines the shape of the sigmoid; a larger value makes the curve flatter. If ρ\rho is omitted, it is implicitly assigned the value of 1. This functionality is semi-linear. a is the slope parameter, which is u/4u/4 as the origin. As uu approaches infinity, the function reduces to a threshold function. A neuron with such an activation function is called a semi-linear unit. The range of this function is [0,1][0,1]. If a non-zero threshold is needed, the function could be defined as

y=σ(u)11+e (uθ)/ρy = \sigma(u) \equiv \frac {1} {1 + e \space ^{-(u - \theta) / \rho}}

These forms are called logistic functions.

Another form of the sigmoid is the hyperbolic function

y=tanh(u)eueueu+euy = tanh(u) \equiv \frac {e^u - e^{-u}} {e^u + e^{-u}}

which can be re-written as

y=tanh(u)1(2 eue+eu)y = tanh(u) \equiv 1 - \Big( \frac {2 \space e^{-u}} {e + e^{-u}} \Big)

The hyperbolic function's range is [1,+1][-1,+1].

Binary pulse stream

Time is divided into discrete time slots if the signal level required is pp, where 0p10 \leq p \leq 1, then the probability of a pulse appearing at each time slot is pp. If the value required is in different range, the signal should be normalised to the unit interval.

In contrast to all previous functions, which are deterministic, the output is interpreted as the probability of outputting 11 rather than an analogue signal. A neuron with such a functionality are known as stochastic semi-linear units.

If pp is unknown, an estimation may be made by counting the 11's then p^=NN1\hat{p} = \frac {N}{N_1}, where NN is the number of time slots and N1N_1 is the number of pulses, i.e. the 11's.

In the stochastic case, the sigmoid may be an approximation to the cumulative gaussian (normal distribution) and if so the model would fit a noisy threshold; that is, the threshold at any time is a random variable with gaussian distribution. Thus, the probability of firing if the activation is uu is the probability that the threshold is less than uu.

We can think of this pulse stream in a different way. The neuron only has two states low and high. The neuron fires probabilistically as follows. Let XX denote the state and P(u)P(u) the probability of firing, where uu is the integrated input signal, then

X={+1with probabilityP(u)1with probability1P(u)X = \left { \begin{array}{ll} +1 & \quad \textrm{with probability} \quad P(u) \ -1 & \quad \textrm{with probability} \quad 1 - P(u) \end{array} \right.

A common choice for P(u)P(u) is the sigmoid function defined as follows

P(u)=11+e u/ρP(u) = \frac {1} {1 + e \space ^{ -u / \rho}}

where ρ\rho is a pseudo-temperature parameter used to control the noise level, i.e. the uncertainty of firing. As ρ0\rho \longrightarrow 0, the function becomes noiseless, i.e. deterministic, and reduces to the McCulloch-Pitts threshold function.