08 December 2017 |
Exporting LDA Functions from Displayr into Excel
In this post I show how discriminant functions can be extracted from a Linear Discriminant Analysis in Displayr. Such functions are often used in Excel (or elsewhere) to make new predictions based on the LDA. I show how a simple calculation can be used to make new predictions based on the discriminant functions. This post follows on from my earlier description of how to perform Linear Discriminant Analysis in Displayr.
Recap of performing Linear Discriminant Analysis (LDA)
To set up the Linear Discriminant Analysis,
- Import the example data from this URL: “http://wiki.q-researchsoftware.com/images/c/ce/Glass.csv”
- Add the LDA model from the Insert > More > Machine Learning menu
- Select the variables, then press Calculate
One of the possible outputs of LDA is the set of discriminant functions. There is one function for each category of the outcome variable. Each function can be used to calculate a score for any data point. The predicted category of the data point is the function with the highest score. The example used here predicts the category of a piece of glass based on its refractive index and chemical properties.
To see the coefficients of these functions, navigate to the Inputs of the Linear Discriminant Analysis (on the right of the screen). Change the Output from Means to Discriminant Functions and click Calculate. The resulting table of coefficients is given below.
This table tells us that the score of an observation for category 1 is -2115766 + 1725406 * “Refractive Index” + 14604 * Na + 13401 * Mg + 17215 * Al + 17122 * Si + 14500 * K + 11306 * Ca + 12896 * Ba + 7517 * Fe.
It would be rather tedious to manually evaluate all 6 functions per data point. So the table can be exported to Excel via Export > Excel or used by another R calculation. In Excel, a matrix of data can be multiplied by the discriminant functions matrix to calculate scores. I am going to perform the same calculation in R to make predictions for the original data.
Manual Calculation of Predictions in R
First I create a table of the data with Insert > More > Tables > Raw Data. I select the 9 outcome variable in the same order as the table above. Then in Insert > R Output type the following few lines of code to make the predictions.
raw.data = cbind(rep(1, nrow(raw.data)), raw.data) raw.data = as.matrix(raw.data) # convert from data.frame to matrix scores = raw.data %*% lda predictions = colnames(scores)[apply(scores, 1, which.max)]
The first line prefixes a column of ones to the data which are multiplied by the intercepts in the matrix multiplication. The third line computes the scores for each case in the data for each category of the outcome variable. The final line chooses the category with the highest score from each row. The first few predictions are shown below.
Checking Manual Predictions
Unless you specifically want to use the discriminant functions, there is actually no need to do so in order to make predictions for the training data. The predictions can be extracted from the LDA model directly.
To do this, make copy by clicking on the original LDA model and selecting Home > Copy and then Paste. Move this copy to a new page for clarity, then change the Output to Means. The predictions can then be added to the data tree with Insert > More > Machine Learning > Save Variable(s) > Predicted Values. You can hover over the new variable created to see that the first few cases are the same as the table above. More thoroughly, you could compare the vectors with code.
My work is saved in this this Displayr document. You can replicate the steps or use your own data by clicking the link (just sign into Displayr first).
Author: Jake Hoare
After escaping from physics to a career in banking, then escaping from banking, I decided to go back to BASIC and study computing. This led me to rediscover artificial intelligence and data science. I now get to indulge myself at Displayr working in the Data Science team, sometimes on machine learning.