Artificial neurons and neural networks

Artificial neurons

An artificial neuron is the most widely used component of artificial neural networks. We could think about it as the simplest unit of computation and a basic building block of such networks. Analogous component in a digital computer would be a logic gate (AND, NAND, OR, NOR, XOR, etc). You could build a computer mostly from logic gates. Same way you could build a neural network mostly from neurons!

NAND logic gate

An artificial neuron does a very simple thing. It converts multiple real number inputs into one real number output. It does that by first performing a weighted sum of the inputs and then passing the result through a so called “activation function”. Such a neuron is like an analog logic gate.

Artificial neuron

Artificial neuron is very loosely inspired by real neurons that process information in a biological brain. Biological neurons also have inputs and an output but they are much more complicated and nobody understands how they really work. For this reason an artificial neuron is an extremely crude approximation of a real biological neuron. In spite of this, the artificial neurons still have surprisingly interesting properties when grouped together and connected in the right ways. A group of artificial neurons forming a network coupled with an appropriate learning algorithm can to some extent learn and act like a part of a real brain!

Why does an artificial neuron work like described above and why is that useful? In general you can think of every real number input as being a piece of information. That information can be anything. It can be pixel color brightness (if that neuron is “looking” at an image) or sound sample (if the neuron is “listening” to audio) or an output from another neuron. The information is not limited to images and audio and it can be any kind of data that can be represented as an array of real numbers.

The output of a neuron can be thought of as the reaction of the neuron to the provided input. For some inputs the neuron can be very “excited” while other kinds of inputs can have the opposite effect. Since the output is a continuous number all reactions in between are also possible. 

The sum of the inputs is “weighted” which allows the neuron to selectively treat some information as more important than other information. The bigger the weight associated with some input the bigger impact that input will have on the final output.

How does the neuron know which information is more important? The answer is that it doesn’t know that initially and it has to learn it over time. The learning process uses an algorithm called “gradient descent”.

Artificial neural networks

An artificial neural network is a very general term which most often means a group of artificial neurons which are interconnected in a specific way. They are usually organized in layers. The input for the first layer is the data from the outside world (again: image, sound or any other data). Next layers take input data from the output of previous layers. Such a network can have multiple outputs (as opposed to a single artificial neuron) because the final layer can contain multiple neurons each of which have just one output. Since those outputs are not connected to anything in the network itself they are considered the output of the network.

What is the purpose of building such networks? The general answer is that they are a good solution if you want to have a tool which can convert one kind of information into another kind of information and this conversion can’t be easily achieved with normal programming methods (for example by writing a lot of “if” statements).

Let’s assume you have a lot of pictures of dogs and cats and you want to categorize them into two groups each of which contains a different animal. If you are a software engineer could you write an algorithm which can distinguish the two? With some effort you could probably create something which works better than a coin flip but no matter how many “if” statements you write to handle various cases there are always going to be more cases which are not handled well by your program. What if you could do that more easily and at the same time handle as many cases as possible without writing an infinite number of “if ” statements? That would be the kind of task you could use an artificial neural network for. If trained properly it will learn the concept of a dog and a cat and will apply that knowledge to any picture you throw at it even if it has never seen that picture before.

Artificial neural network with one input layer and two processing layers (only hidden layer and output layer contain actual neurons while input layer just represents the data that is feed to the network)

A deep neural network is just a neural network with “many” layers. Some networks used today have more than 100 layers! In general the more layers a network has the more complex concepts and patterns it can learn.

Why is more layers better? You can think of every layer as working on a different level of abstraction. Again, imagine a neural network which looks at an image of a cat. How will it know it is a cat? A single layer is not powerful enough to distinguish a cat from anything else. Remember how simple a single neuron in such a layer is. It just computes a weighted sum of inputs and passes that through an activation function. However a single layer might already be enough to detect edges in the picture. Some neurons might be excited by vertical edges, others by horizontal edges and still other by diagonal edges. If you then pass the information about the edges to the second layer then that layer no longer has to work with raw pixel data. The data is already partially processed and we are now closer to the final output. The second layer might take the edge information and start noticing shapes which are formed by those edges. Subsequent layers can detect more and more complex shapes building on the work of the previous layers until the final layer starts noticing an entire cat or an entire dog in the picture. The more complex the task the more layers you need but also the more computational power is necessary.

Even though the artificial neuron is the main building block of most common neural networks it is possible to build neural networks from any set of functions that are differentiable (I will explain what “differentiable” means in another article). So a neural network built from neurons as described above is only a specific case of more broad set of computational networks which can learn and convert some input data to some output data.