Overview: GPT-3—The Good, the Bad, and the Ugly

Lets take a look at what we'll be learning in this chapter.

Every technological revolution brings controversy. In this section, we focus on four of the most controversial aspects of GPT-3: AI bias being encoded into the model, low-quality content and the spread of misinformation, GPT-3’s environmental footprint, and data privacy issues. When you mix human biases with a powerful tool capable of producing huge quantities of seemingly coherent text, the results can be dangerous.

The fluency and coherence of much of GPT-3’s text output raise several risks because people are prepared to interpret it as meaningful. Many also view the human developers involved in creating GPT-3-based apps as authors of their output and demand that they are held accountable for their content.

The risks we consider in this section follow from the nature of GPT-3’s training data, which is to say, the English-speaking internet. Human language reflects our worldviews, including our biases—and people who have the time and access to publish their words online are often in positions of privilege with respect to racism, sexism, and other forms of oppression, which means they tend to be overrepresented in LLM training data. In short, society’s biases and dominant worldviews are already encoded in the training data. Without careful fine-tuning, GPT-3 absorbs these biases, problematic associations, and violent abuse and includes them in its output for the world to interpret.

Whatever biases appear in the initial training set or user input are repeated and can be amplified or even radicalized by the GPT-3-generated output. The risk is that people read and spread such texts, reinforcing and propagating problematic stereotypes and abusive language in the process. In this section, we’ll look at bias in GPT predecessors, the consequence of GPT-generated misinformation, and environmental hazards caused by the training of models.

Get hands-on with 1200+ tech skills courses.