# Binary Data and The Wells Dataset

Let’s get a brief overview of binary data and the wells dataset.

## We'll cover the following

## R packages

We’ll use the following R packages in this chapter:

`ggplot2`

`arm`

`ggfortify`

`Sleuth3`

## Binary data

One of the most important uses of GLMs is for the analysis of binary data. Binary data are an extreme form of binomial count data where the binomial denominator is equal to one, so that every trial produces a value of either 1 or 0. Therefore, binary data can be analyzed in a similar way to binomial counts. In other words, we can use a GLM with a binomial distribution and the same choice of link functions to prevent predictions from going below zero or above values of one. However, despite the use of the same distribution and link functions, due to the constrained nature of binary data, there are some differences in the analysis of binomial counts.

For one thing, the use of the ratio of the residual deviance to residual DF to diagnose overdispersion or underdispersion doesn’t apply. Given that R’s default set of residual checking plots are also of little (if any) use when applied to a binomial GLM, this leaves us without any means for model checking with the base distribution of R. Luckily, the `arm`

package

## An example of the wells dataset

Our example dataset for a binary GLM comes from an `Data_Binary_Wells`

:

```
wells <- read.table("Data_Binary_Wells.txt", header = TRUE)
```

The example concerns an area of Bangladesh where many wells used for drinking water have been contaminated by naturally-occurring arsenic:

Get hands-on with 1200+ tech skills courses.