# Sources of Self-reported Happiness: Logistic Regression

Analyze sources of self-reported happiness using logistic regression.

## We'll cover the following

## Conceptual preparation

We’re interested in finding out whether gender, religious belief, and income influence self-reported happiness or not. The dependent variable `happy`

is dichotomous, with `1`

being happy and `0`

being unhappy.

In this type of problem, OLS isn’t appropriate because it can generate predicted probabilities larger than one and smaller than zero. A widely used statistical technique is logistic regression.

Conceptually, the probability of a respondent being happy or not can be expressed as being a function of gender, religious belief, and income.

$\pi_i =p( happy_i =1) = probability \ \ respondent_i\ \ being \ \ happy$

$=\beta_0 + \beta_1male+\beta_2 belief+\beta_3income$

To keep the predicted probability bounded between 0 and 1, the logistic regression fits an S-shaped relationship between happy and other covariates with the following model:

$ln\bigg(\frac{\pi_i}{1-\pi_i}\bigg)=\beta_0 + \beta_1male+\beta_2 belief+\beta_3income$

Above, $\frac{\pi_i}{1-\pi_i}$ is the odds of a respondent being happy, which is the probability of being happy $(\pi_i)$ divided by the probability of being unhappy (1 − $\pi_i$), and $ln\bigg(\frac{\pi_i}{1-\pi_i}\bigg)$is the log odds or logistic transformation of odds.

Two issues are worth clarification.

- First, the $\beta_s$ are regression parameters. Following previous chapters, we carry out hypothesis testing with respect to the null hypothesis on each $\beta$ being zero, indicating that a variable has no statistical effect on the dependent variable in the population.
- Second, what is substantively most interesting is the value of πi, the probability of a respondent being happy, under different values of the independent variables. To obtain that value, we can apply the following formula:

$\pi_i= \frac{e^{\beta_0 + \beta_1male+\beta_2 belief+\beta_3income}}{1+e^{\beta_0 + \beta_1male+\beta_2 belief+\beta_3income}}$

## Data preparation

We already have all four variables in the above model prepared except for `income`

. So, we now get the `income`

variable ready for analysis. The codebook definition for the income variable is as follows:

“

`V239`

. On this card is an income scale on which 1 indicates the lowest income group and $10$ the highest income group in your country. We would like to know in what group your household is. Please, specify the appropriate number, counting all wages, salaries, pensions and other incomes that come in. (Code one number):”

The tabulation shows that the variable has several negative values indicating missing values. They will be recoded as `NA`

.

Get hands-on with 1200+ tech skills courses.