Generative AI

What is generative AI?

As we discussed previously, generative AI is an artificial intelligence technology that generates various types of new content, including text, images, audio, videos, and synthetic data. Unlike other types of AI that rely on pre-existing data to make decisions, generative AI learns patterns and relationships in the input data and generates new and unique data.

Various tasks that the generative AI models can perform
Various tasks that the generative AI models can perform

Generative AI requires a prompt in the form of text, image, video, or musical notes, among other inputs that the AI system can interpret. Then, different AI algorithms produce new content in response to the prompt. The resulting content may comprise essays, answers to problems, or images and videos as directed by the prompt. It creates complex and realistic content that imitates human creativity. This quality makes it a valuable tool for numerous industries, including gaming, entertainment, and product design.

Types of generative AI models

Various types of generative AI models perform specific tasks. The most popular types are as follows:

  • Variational autoencoders (VAEs): This is a type of neural network that learns a compressed representation of the input data, called a latent space, and can then generate new examples by sampling from this latent space.

  • Generative adversarial networks (GANs): GANs are a type of neural network that can generate new data similar to a given dataset. GANs are trained in an adversarial process where a generator network generates data samples, and a discriminator network evaluates the generated samples and determines if they are real or fake. The generator network is trained to improve its ability to generate realistic data by trying to trick the discriminator network. It is trained to identify the actual data from the generated data correctly. They have been used for various applications, such as generating realistic images, videos, and audio.

  • Transformers: This is a neural network used extensively for natural language processing tasks, such as language translation and text generation. Transformers rely on self-attention mechanisms to learn contextual relationships between words in a text sequence. They are faster to train and easily parallelizable.

  • Autoregressive models: This is a generative model that generates new data similar in distribution to the training data. Autoregressive models are particularly well-suited for generating sequential data, such as time series or text, where each new value depends on previous values. These models have been used in various applications, including speech synthesis, natural language processing, and music generation.

Large language models

Large language models, or LLMs, are a type of machine learning model that can generate natural language text with impressive quality and fluency. They are trained on massive text datasets using deep neural network architectures, such as transformers, and can learn to predict the probability distribution of words in a text sequence.

LLMs are designed to be highly flexible and can be fine-tuned to perform tasks, such as language translation, text summarization, question-answering, analyzing, or inferring by adjusting the model's parameters and training it on task-specific data. This flexibility makes LLMs a versatile tool for various natural language processing applications.

Applications of LLMs
Applications of LLMs

There are two main types of LLMs—base LLMs and fine-tuned LLMs.

Base LLMs

Base LLMs are smaller and simpler versions of LLMs. These models have fewer parameters, which makes them efficient to train and use. They are typically used for smaller natural language processing tasks like sentiment analysis, text prediction, and text classification.

For example, a base LLM trained on a large corpus of news articles could be used to predict the likelihood of certain words or phrases in a news article on a particular topic based on the context of the surrounding text.

Here, we can see one example of how a base LLM may be used:

A base LLM predicting the complete sentence
A base LLM predicting the complete sentence

Fine-tuned LLMs

Fine-tuned LLMs refer to large language models built on a base LLM and further fine-tuned with inputs and outputs that are instructions and preferable responses to those instructions. These models are further refined using the process of reinforcement learning with human feedback (RLHF), where humans test and correct the responses for reliability.

Here's an example of how fine-tuned LLMs can respond to questions and provide answers:

A fine-tuned LLM providing a response based on the instruction
A fine-tuned LLM providing a response based on the instruction

Examples of generative AI tools and models

Several AI models have emerged recently and are becoming increasingly popular. Let's look at a few examples:

GPT-3

GPT-3 (Generative Pre-trained Transformer 3) is a language processing AI model developed by OpenAI, which is capable of generating very complex text. It can take small amount of inputs to produce relevant and useful responses. GPT-3 has 175 billion parameters, making it one of the largest and most powerful language models ever created. It has a wide range of applications, including text completion, summarization, translation, question-answering, and more.

ChatGPT

ChatGPT is an LLM created by OpenAI. It is based on the GPT architecture and can generate complex responses to a wide range of prompts, including text-based prompts, questions, and commands. ChatGPT is designed to be a conversational AI that can engage in dialogue with users on a variety of topics and is commonly used in chatbots, virtual assistants, and other natural language processing applications.

DALL⋅E

A generative AI model developed by OpenAI that can create images from textual descriptions. It is based on the GPT-3 architecture. Based on textual prompts, DALL\cdotE can generate a wide range of images, including objects, animals, scenes, and abstract concepts. The model has gained attention for its ability to generate highly detailed and imaginative images that can be used for many purposes, including creative projects, design, and marketing.

Midjourney AI

Midjourney is a generative AI model developed by an independent research lab. The goal of the model is to be able to convert imagination into art. The generated art style is dream-like and appeals to users interested in fantasy, gothic, and sci-fi themes.

Stable Diffusion

Stable Diffusion created by Stability AI is a text-to-image diffusion model. It generates photo-realistic images based on text descriptions. It allows manipulating existing photos by removing or adding new details.