I was meeting with a non-technical business colleague/manager recently, and he asked me to explain some of these AI terms like machine learning and deep learning. “Is machine learning a type of deep learning, or is it the other way around?”. So, I drew this picture on the whiteboard, and went on a somewhat rambling lecture on AI and its history. It then occurred to me that many business types are asking themselves (and their colleagues) similar questions, and I have seen more than one manager mangle these distinctions. So, this article is an attempt at a quick non-technical overview.
In the spirit of the hastily hand-drawn picture above, this article is not intended to be a thorough taxonomical categorization of the sub-fields of AI. Nor do I make any claims of accuracy, so go easy on me if you disagree, or think I am wrong on some detail. If you Google “What is AI?”, you will get tons of in-depth articles/blogs (including Wikipedia entries), books, and many images that are substantially more comprehensive. But many non-technical managers have neither the time nor the inclination to dive into details; this article is aimed at helping some of these folks be a little better informed on AI.
AI can be broadly classified as symbolic AI and Machine Learning. These days, the term AI is synonymous with Machine Learning, and more recently with Deep Learning. But, the origins of AI were mostly symbolic with hand-coded rules meant to capture expert knowledge. Typically, an AI software engineer, well-versed in an “AI language” like Lisp or Prolog, would be paired with a domain expert (say, a medical doctor) to represent the relevant knowledge in the form of IF-THEN rules. There are many other symbolic knowledge representation mechanisms, besides rules, such as frames. To this day, you will find rules/frames used in many products such as many state-of-the-art chatbot frameworks that use AIML rules or intent frames to author dialogs using scripted conversations. These products, while somewhat successful, still suffer from the limitations noted below.
After a couple of AI hype cycles during the 1960s and the 1980s, the field of AI entered a long “AI Winter” till the mid-2000s. Why? Well, rules/frames are fragile from a software engineering stand-point and it is difficult to manage/maintain a symbolic AI system once we get past a few hundred rules. Rules start conflicting with each other and it is impossible to trace the sequence of rule triggers and to debug these systems. Rules have to be authored by hand with extensive input from expensive and busy domain experts. The “learning” in these systems is mostly “supervised” and off-line. Attempts were made to create rules automatically through “unsupervised” and online learning based on feedback from user interactions. However, most of these attempts remained academic efforts with few commercially successful implementations.
Machine Learning got started in the mid-1990s when computer scientists and statisticians started collaborating and learning from each other. Algorithms such as decision trees and support vector machines were used in the early 2000s to mine increasingly large databases for patterns that can be used for prediction/classification and other advanced analytical tasks. The emergence of faster computers and “big data” software tools such as Hadoop ignited interest in data-driven pattern recognition that enables computers to learn from historical data. The main difference is that the new AI engineers, now called data scientists, do not engage in traditional software engineering. Rather, their job is to extract features from the raw data and use these features to create supervised learning models that enables the machine to learn to classify and predict based on historical data. The data scientist provides labelled data that identifies the combination of features that point to each distinct class/label. This “model engineering” is far more robust than “rules engineering” and benefits from a virtuous cycle of faster computers, more data, and online feedback from users. Unsupervised machine learning methods such as clustering are often used in combination with supervised methods.
Deep Learning has its origins in Artificial Neural Networks (ANN) which were part of “Connectionist AI“, also dating back to the 1960s. Many algorithmic advances such as backpropagation, multi-layer perceptrons, convolutional networks and recurrent networks were progressively discovered in the 1980s, 1990s, and the 2000s. But deep learning, which gets its name from the multitude of neural layers (ranging from 5 to 100 or more), only became commercially viable about 5 years ago with the emergence of GPUs as the computational workhorses. These faster GPU-based computers along with the availability of massive amounts of unstructured data such as images, audio, video and text, is key to the current success of AI. In addition, the pace of innovation in deep learning algorithms and architectures, over the last 5 years, has been incredible. Today, deep learning systems can perform image recognition, speech recognition and natural language understanding tasks with astonishing accuracy.
Deep learning systems are also mostly supervised learning systems in that enormous amounts of labelled data has to be supplied by the data scientist to train these systems (the weights of the interconnections between the neurons). But, unlike more traditional statistical machine learning algorithms (like random forests), deep learning systems can automatically perform feature extraction from raw data. So, the data scientists do not have to perform feature engineering. The significance of deep learning is that successive layers learn features at increasing levels of abstraction. So, while the first few layers might recognize edges and other lower level image features, the next few layers recognize higher-level features such as nose/ear/mouth, while the next few layers recognize the entire face and so on.
Generative Adversarial Networks (GANs) and Autoencoders are examples of unsupervised deep learning systems. Reinforcement Learning systems are examples of deep learning systems that may be thought of as online learning systems in that they learn directly from actions performed in a simulated environment and feedback obtained when deployed in real environment. Autonomous cars and game-playing systems such as AlphaGo utilize reinforcement learning; AlphaGo is a good example of simulation-based learning in that the system was trained by playing against itself a gazillion times. This is also, then, an example of unsupervised learning since the system gets better on its own by observing its mistakes and correcting them.
There are many other related sub-fields of AI such as evolutionary (genetic) algorithms, game theory, multi-agent systems and so on. Also, note that AI benefits from other disciplines such as mathematical optimization which have been part of other areas such as Operations Research (OR). In fact, the recent boom in AI has also rejuvenated interest in related fields such as control theory since many of the algorithms behind autonomous cars, drones and robotics have mathematical roots in other disciplines. AI is therefore a truly inter-disciplinary field where scientists and engineers from a variety of backgrounds are able to apply their mathematical and software skills.
I have tried to keep this overview non-technical and brief. I hope this helps some business types get a hang of some of the buzzwords and jargon floating around the office.