Skip to main content

Activation Functions (Neural Networks)

Activation functions are really important for a Neural Network to learn and make sense of something really complicated and Non-linear complex functional mappings between the inputs and response variable.They introduce non-linear properties to our Network.Their main purpose is to convert a input signal of a node in a A-NN to an output signal. That output signal now is used as a input in the next layer in the stack.
Specifically in A-NN we do the sum of products of inputs(X) and their corresponding Weights(W) and apply a Activation function f(x) to it to get the output of that layer and feed it as an input to the next layer.
In keras, we can use different activation function for each layer. That means that in our case we have to decide what activation function we should be utilized in the hidden layer and the output layer.
Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers:
Neuron
“Input times weights , add Bias and Activate”
from keras.layers import Activation, Dense

model.add(Dense(64))
model.add(Activation('tanh'))
This is equivalent to:
model.add(Dense(64, activation='tanh'))
You can also pass an element-wise TensorFlow/Theano/CNTK function as an activation:
model.add(Dense(64, activation=K.tanh))

Step Function

step function
Activation function A = “activated” if Y > threshold else not
Pros
  • Simple to understand
Cons
  • Can't handle multiple classes.
  • Can't give output like 20% or 30%.
Conclusion
Give other activation function for the hidden layers and you can use step function in the final layer.

Linear Function

A straight line function where activation is proportional to input ( which is the weighted sum from neuron ).
Pros
  • It gives a range of activations, so it is not binary activation.
  • We can definitely connect a few neurons together and if more than 1 fires, we could take the max ( or softmax) and decide based on that.
Cons
  • For this function, derivative is a constant. That means, the gradient has no relationship with X.
  • It is a constant gradient and the descent is going to be on constant gradient.
  • If there is an error in prediction, the changes made by back propagation is constant and not depending on the change in input delta(x) !


Sigmoid Function

Sigmoid function
It is a activation function of form f(x) = 1 / 1 + exp(-x) . Its Range is between 0 and 1. It is a S — shaped curve.
Pros
  • It is nonlinear in nature. Combinations of this function are also nonlinear!
  • It will give an analog activation unlike step function.
  • It has a smooth gradient too.
  • It’s good for a classifier.
  • The output of the activation function is always going to be in range (0,1) compared to (-inf, inf) of linear function. So we have our activations bound in a range. Nice, it won’t blow up the activations then.
Cons
  • Towards either end of the sigmoid function, the Y values tend to respond very less to changes in X.
  • It gives rise to a problem of “vanishing gradients”.
  • Its output isn’t zero centered. It makes the gradient updates go too far in different directions. 0 < output < 1, and it makes optimization harder.
  • Sigmoids saturate and kill gradients.
  • The network refuses to learn further or is drastically slow ( depending on use case and until gradient /computation gets hit by floating point value limits ).

Tanh (Hyperbolic Tangent function)

tanh
A better version of Sigmoid for many cases due to its range.
tanh2
It’s mathamatical formula is f(x) = 1 — exp(-2x) / 1 + exp(-2x). Now it’s output is zero centered because its range in between -1 to 1 i.e -1 < output < 1 . Hence optimization is easier in this method hence in practice it is always preferred over Sigmoid function . But it still suffers from Vanishing gradient problem.
Deciding between the sigmoid or tanh will depend on your requirement of gradient strength.
Pros
  • The gradient is stronger for tanh than sigmoid ( derivatives are steeper).
Cons
  • Tanh also has the vanishing gradient problem.

ReLu (Rectified Linear units)

It has become very popular in the past couple of years. It was recently proved that it had 6 times improvement in convergence from Tanh function. It’s just R(x) = max(0,x) i.e if x < 0 , R(x) = 0 and if x >= 0 , R(x) = x.
RELU

Pros

  • It avoids and rectifies vanishing gradient problem.
  • ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations.
Cons
  • One of its limitation is that it should only be used within Hidden layers of a Neural Network Model.
  • Some gradients can be fragile during training and can die. It can cause a weight update which will makes it never activate on any data point again. Simply saying that ReLu could result in Dead Neurons.
  • In another words, For activations in the region (x<0) of ReLu, gradient will be 0 because of which the weights will not get adjusted during descent. That means, those neurons which go into that state will stop responding to variations in error/ input ( simply because gradient is 0, nothing changes ). This is called dying ReLu problem.
  • The range of ReLu is [0, inf). This means it can blow up the activation.
There are variations in ReLu to mitigate the issue of Dying ReLU issue by simply making the horizontal line into non-horizontal component . for example y = 0.01x for x<0 will make it a slightly inclined line rather than horizontal line. This is Leaky ReLu. There are other variations too. The main idea is to let the gradient be non zero and recover during training eventually.

Comments

Popular posts from this blog

Ceph Single Node Setup Ubuntu

Single Node Ceph Install A quick guide for installing Ceph on a single node for demo purposes. It almost goes without saying that this is for tire-kickers who just want to test out the software. Ceph is a powerful distributed storage platform with a focus on spreading the failure domain across disks, servers, racks, pods, and datacenters. It doesn’t get a chance to shine if limited to a single node. With that said, let’s get on with it. Inspired from:  http://palmerville.github.io/2016/04/30/single-node-ceph-install.html Hardware This example uses a VMware Workstation 11 VM with 4 disks attached (1 for OS/App, 3 for Storage). Those installing on physical hardware for a more permanent home setup will obviously want to increase the OS disks for redundancy. To get started create a new VM with the following specs: ·         Name: ceph-single-node ·         Type: Linux ·         Version: Ubuntu 16.04.03 (64-bit) ·         Memory: 4GB ·         Disk: 25GB (Dynamic) ·

How to expose your local server to Internet?

As a developer, we always have a wish to expose our work to internet, so that we can show those to our friends or teachers for testing. But, what we choose to use services of public cloud and sometimes it becomes a bit more expensive way for small projects. So, friends I have found a way to expose your localhost services to the internet without port forwarding through the NAT of your ISP. The solution is: NGROK What is ngrok? Ngrok exposes local servers behind NATs and firewalls to the public internet over secure tunnels. How it works You download and run a program on your machine and provide it the port of a network service, usually a web server. It connects to the ngrok cloud service which accepts traffic on a public address and relays that traffic through to the ngrok process running on your machine and then on to the local address you specified. What it's good for Demoing web sites without deploying Building webhook consumers on your dev machine Testing m

Docker Overview

OVERVIEW Docker is the company driving the container movement and the only container platform provider to address every application across the hybrid cloud. Today’s businesses are under pressure to digitally transform but are constrained by existing applications and infrastructure while rationalizing an increasingly diverse portfolio of clouds, datacenters and application architectures. Docker enables true independence between applications and infrastructure and developers and IT ops to unlock their potential and creates a model for better collaboration and innovation. A little intro to LXC: - LXC (LinuX Containers) is a OS-level virtualization technology that allows creation and running of multiple isolated Linux virtual environments (VE) on a single control host. These isolation levels or containers can be used to either sandbox specific applications, or to emulate an entirely new host. LXC uses Linux’s cgroups functionality, which was introduced in version 2.6.24 to