Have you ever thought of why did AI became a mainstream technology in the past 10 years although it has been tried and failed several times in history?
The answer is twofold. One is the vast availability of data after the internet era and the other is the advanced hardware capable of processing that data.
Today let’s look at one such amazing technology which greatly helped AI in reaching its current shape — Nvidia’s CUDA.
CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on its own GPUs (graphics processing units). CUDA enables developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation.
GPUs are very old. In fact they are invented around the same time mainstream personal computers itself were getting started — like the 70s.
NVIDIA started trying to compete in the 3D accelerator market with weak products, but learned as it went, and in 1999 introduced the successful GeForce 256, the first graphics card to be called a GPU.
Although the first ever GPU was launched during 1999, its only purpose was to improve performance during gaming.
It was only on 2006, GPUs were experimented for other purposes in engineering, simulation, etc. But in order to do this, computer programmers have to access GPU to write programs that work on them. That became possible after Nvidia released its interface called CUDA (Compute Unified Device Architecture) as a platform in C (programming language).
It became a huge paradigm shift for compute intensive workloads. But the problem is, CUDA is only for Nvidia GPUs. In an attempt to break this, AMD released OpenCL as an alternative in 2009, but it didn’t do well.
CUDA became supported in every mainstream industry — Finance, Weather modelling, Defence, Manufacturing, Media, Medical imaging and of course AI and deep learning.
Basically anything that requires compute intensive tasks became offloaded to the GPU.
But how could GPUs improve performance of normal computer programs other than graphic rendering?
Basically, a CPU has something called as an ALU - Arithmetic and Logical Unit which performs mathematical operations. The problem is, a CPU cannot parallelize well. It is designed to be a decision making system and not a computation platform. Here’s where GPU comes into the picture. They have so many mathematical units - several 1000s of them. But they can only perform simple mathematical operations in extreme parallel speed.
Thus, whenever, parallel mathematical operations like Matrix multiplication and addition are involved, the GPUs can vastly outperform the CPUs. And that’s why, image processing and graphic rendering (which is just an RGB matrix) is very fast on a GPU.
Deep learning uses this idea and converts every input into a matrix and sends it to the GPU for performance gains.