Sources of Self-reported Happiness: Logistic Regression

Analyze sources of self-reported happiness using logistic regression.

Conceptual preparation

We’re interested in finding out whether gender, religious belief, and income influence self-reported happiness or not. The dependent variable happy is dichotomous, with 1 being happy and 0 being unhappy.

In this type of problem, OLS isn’t appropriate because it can generate predicted probabilities larger than one and smaller than zero. A widely used statistical technique is logistic regression.

Conceptually, the probability of a respondent being happy or not can be expressed as being a function of gender, religious belief, and income.

πi=p(happyi=1)=probability  respondenti  being  happy\pi_i =p( happy_i =1) = probability \ \ respondent_i\ \ being \ \ happy

=β0+β1male+β2belief+β3income=\beta_0 + \beta_1male+\beta_2 belief+\beta_3income

To keep the predicted probability bounded between 0 and 1, the logistic regression fits an S-shaped relationship between happy and other covariates with the following model:

ln(πi1πi)=β0+β1male+β2belief+β3incomeln\bigg(\frac{\pi_i}{1-\pi_i}\bigg)=\beta_0 + \beta_1male+\beta_2 belief+\beta_3income

Above, πi1πi\frac{\pi_i}{1-\pi_i} is the odds of a respondent being happy, which is the probability of being happy (πi)(\pi_i) divided by the probability of being unhappy (1 − πi\pi_i), and ln(πi1πi)ln\bigg(\frac{\pi_i}{1-\pi_i}\bigg)is the log odds or logistic transformation of odds.

Two issues are worth clarification.

  • First, the βs\beta_s are regression parameters. Following previous chapters, we carry out hypothesis testing with respect to the null hypothesis on each β\beta being zero, indicating that a variable has no statistical effect on the dependent variable in the population.
  • Second, what is substantively most interesting is the value of πi, the probability of a respondent being happy, under different values of the independent variables. To obtain that value, we can apply the following formula:

πi=eβ0+β1male+β2belief+β3income1+eβ0+β1male+β2belief+β3income\pi_i= \frac{e^{\beta_0 + \beta_1male+\beta_2 belief+\beta_3income}}{1+e^{\beta_0 + \beta_1male+\beta_2 belief+\beta_3income}}

Data preparation

We already have all four variables in the above model prepared except for income. So, we now get the income variable ready for analysis. The codebook definition for the income variable is as follows:

V239. On this card is an income scale on which 1 indicates the lowest income group and 1010 the highest income group in your country. We would like to know in what group your household is. Please, specify the appropriate number, counting all wages, salaries, pensions and other incomes that come in. (Code one number):”

The tabulation shows that the variable has several negative values indicating missing values. They will be recoded as NA.

Get hands-on with 1200+ tech skills courses.