Advanced Analysis | Regression | Using Displayr | What is...

What is Logistic Regression?

by Justin Yap

Try Displayr's regression analysis tool today and make better, data-driven decisions

Logistic regression — also known as logit regression, binary logit, or binary logistic regression — is a type of regression analysis used when the dependent variable is binary (i.e., has only two possible outcomes). It is used widely in many fields, particularly in medical and social science research.

Examples of situations where logistic regression can be applied are:

Predicting the risk of developing heart disease given characteristics such as age, gender, body mass index, smoking habits, diet, and exercise frequency.
Predicting whether a consumer will buy an SUV given their income, marital status, number of children, and how much time they spend outdoors.
Predicting whether a student will pass an exam given their past grades, homework completion, and class attendance.

Logistic regression is a special case of a generalized linear model (GLM), which also includes linear regression, Poisson regression, and multinomial logistic regression.

Theory

Linear regression is used to model a numeric variable as a linear combination of numeric independent variables $x_1,x_2, cdots,x_m$ weighted by the coefficients $beta_0,beta_1, cdots,beta_m$:
[
y_textrm{fitted}=beta_0+beta_1x_1+ cdots+beta_mx_m
]
Suppose instead that $y$ is a binary variable. In the past, linear regression would also have been used. There are several disadvantages with this. These all stem from the fact that we are using a linear combination of numeric variables, which may be any number, to model a binary variable that has only two values.

The approach used by logistic regression is to model the log of the odds ratio of the outcomes instead:
[
textrm{ln}left(frac{p}{1-p}right){=beta}_0+beta_1x_1+ cdots+beta_mx_m
]
where $p$ is the probability of one of the two outcomes. The left-hand side is a function of $p$ known as the logit function, which has a range from $-infty$ to $infty$:
[
textrm{ln}left(frac{p}{1-p}right){=beta}_0+beta_1x_1+ cdots+beta_mx_m
]
The closely related probit regression differs from logistic regression by replacing the logit function with the inverse normal cumulative distribution.

A logistic regression model is fit by estimating the coefficients $textrm{ln}left(frac{p}{1-p}right){=beta}_0+beta_1x_1+ cdots+beta_mx_m$ using maximum likelihood estimation. This is because no closed-form solution exists, unlike for linear regression. In practice, logistic regression is carried out using statistical software. For example, in R, the glm function can be used (with the setting family = binomial(link = 'logit')).

Output

The output typically consists of estimates of the coefficients $beta_0,beta_1, cdots,beta_m$ , as well as their corresponding standard errors and Wald z-statistics. Using the z-statistics, the coefficients are tested for significance from zero using a z-test. A likelihood-ratio test may also be conducted. This will determine if the predictors provide a significantly improved model fit over a null model with no predictors. In addition, pseudo-R²s analogous to R² from linear regression can be computed, such as the McFadden R², to assess the goodness of fit of a logistic regression model.

Want to learn more about regression analysis techniques? Check out our how-to guide.

TECHNIQUES

TECHNIQUES

OBJECTIVES

CAPABILITIES

DATA SOURCES

LEARN

SUPPORT

UPCOMING WEBINAR

What is Logistic Regression?

Theory

Output

Prepare to watch, play, learn, make, and discover!

Get access to all the premium content on Displayr

Last question, we promise!

What type of survey data are you working with? (select all that apply)