## Example

The table below shows a categorical variable that takes on three unique values: A, B, and C. The three dummy variables that represent this variable are shown to the right, where each variable takes a value of 0 when its category is not present, and a value of 1 when its category is present.

A100
A100
B010
A100
B010
C001
A100

## The role of dummy variables in analysis

Dummy variables are the main way that categorical variables are included as predictors in statistical and machine learning models.  For example, the output below is from a linear regression where the outcome variable is profitability, and the predictor is the number of employees. With statistical models such as linear regression, one of the dummy variables needs to be excluded (by convention, the first or the last), otherwise the predictor variables are perfectly correlated; in the example below, the variable representing companies with a single employee (the owner) has been excluded. This is expressed as the following formula:

$\begin{array}{rl} \textrm{profit}\hspace{0.5em}=&1376+1079\times[2-5 \textrm{ employees}]+5238×[6-20 \textrm{ employees}]+\\ &12503×[21-50 \textrm{ employees}]+27711×[51 \textrm{ or more employees}] \end{array}$

As the variables are all dummy variables, this means that they have values of 0 and 1.  The predicted profit for a firm with one employee is then:
$1376=1376+1079\times0+5238\times0+12503\times0+27711\times0$

For a firm with three employees it is:

$2455=1376+1079\times1+5238\times0+12503\times0+27711\times0$

We are multiplying  by 1, as it the parameter for the dummy variable which can only take values of 0 and 1 (i.e., we do not multiply by the number of employees).

### Alternatives to dummy variables

The main benefit of dummy variables is that they are simple. There are often better alternative basis functions, such as orthogonal polynomials, effects coding, and splines.