Blog.

What is.

What is Sampling Error?
07 November 2018 | by Tim Bock

Sampling error is the difference between the sample values and the true population values, which results from the use of random sampling.

What is a Latent Variable?
26 October 2018 | by Tim Bock

A latent variable is a variable that is inferred using models from observed data. These can be inferred through a wide range of approaches.

What is Feature Engineering?
25 October 2018 | by Tim Bock

Feature engineering is the process of selecting and transforming variables when creating a predictive model using machine learning or statistical modeling

What is Non-Sampling Error?
20 October 2018 | by Tim Bock

Non-sampling error refers to any deviation between the results of a survey and the truth which are not caused by the random selecting of observations.

What is Survey Data Processing?
15 October 2018 | by Oliver Harrison

Survey data processing is the manipulation or transformation of raw survey data into meaningful results which can be analyzed to answer a research question.

What is a Conversion Rate?
15 October 2018 | by Tim Bock

A conversion rate is the percentage of people that move from one stage to the next stage in the process. It is often used to identify weak spots in a company's ...

What is a Model?
14 October 2018 | by Tim Bock

A model is a usable description of how a system is believed to work. It is a simplification of reality, with unnecessary detail excluded.

What is a Standard Error?
09 October 2018 | by Tim Bock

Standard error is the estimated standard deviation of the sampling distribution of a parameter. It quantifies the uncertainty around a parameter.

What is the Price Sensitivity Meter?
26 September 2018 | by Tim Bock

A price sensitivity meter is a set of survey questions that is used to work out how to set prices for products. It works out a framework for what people conside...

What is Multidimensional Scaling (MDS)?
25 September 2018 | by Tim Bock

Multidimensional scalling (MDS) is a technique used to visualize the distance between objects when the distance between pairs of objects are known.

What is the Chi-Square Test of Homogeneity?
24 September 2018 | by Tim Bock

The chi-square test of homogeneity tests to see whether different columns (or rows) of data in a table come from the same population or not.

What is a Column Chart?
23 September 2018 | by Tim Bock

A column chart is a data visualization where each category is represented by a rectangle with the height of the rectangle being proportional to the values.

What is Overplotting?
22 September 2018 | by Tim Bock

Overplotting is when the values or labels in a data visualization overlap, making the data visualization diffcult to read. Find out more.

What is a Bubble Chart?
21 September 2018 | by Tim Bock

A bubble chart is a data visualization that displays multiple circles (bubbles) in a two-dimensional plot. They can be used to show three variables.

What is Effective Sample Size?
20 September 2018 | by Tim Bock

The effective sample size is an estimate of the sample size required to achieve the same level of precision if that sample was a simple random sample.

What is a Labeled Scatter Plot?
19 September 2018 | by Tim Bock

A labeled scatter plot is a data visualization that displays the values of two different variables, with text labels showing the meaning of each data point.

What is a Scatter Plot?
19 September 2018 | by Tim Bock

A scatter plot is a chart that displays the values of two variables as points. The data for each point is represented by its position on the chart.

What is A/B Testing?
18 September 2018 | by Tim Bock

A/B testing involves testing two different approaches to solving a problem - approach A or approach B and working out which is better according to the data.

What is the Chi-Square Test of Independence?
17 September 2018 | by Tim Bock

The chi-square test of independence tests to see whether there is a relationship between two categorical variables in a dataset.

What is R-Squared?
09 September 2018 | by Tim Bock

The R-Squared statistic quantifies the predictive accuracy of a statistical model. It is also known as the coefficient of determination and RÂ².

What is the Chi-Square Frequency Test?
08 September 2018 | by Tim Bock

The chi-square frequency test works out if two variable values are consistent with expectations and if a difference is statistically signficant.

What are Survey Quotas?
07 September 2018 | by Tim Bock

Survey quotas are the number of observations to meet a specified requirement. Learn about interlocking and non-interlocking survey quotas.

What is Data Filtering?
04 September 2018 | by Chris Facer

Data filtering is the process of choosing a smaller part of your dataset and using that subset for viewng or analysis. It is usually temporary.

What are Data Measurement Scales?
04 September 2018 | by Tim Bock

Data measurement scales are classifications which indicate the types of mathematical operations that can be performed on the data.

What is a Crosstab?
30 August 2018 | by Tim Bock

A crosstab is table showing the relationship between two or more variables. Crosstabs are useful for finding patterns and correlations in data.

What are Dummy Variables?
29 August 2018 | by Tim Bock

Dummy variables are variables that take values of 0 and 1, where the values indicate the presence or absence of something.

What are Small Multiples?
26 August 2018 | by Tim Bock

A small multiple is a data visualization that consists of multiple charts arranged in a grid. This makes it easy to compare the entirety of the data.

What is Logistic Regression?
22 August 2018 | by Justin Yap

Logistic regression is a type of regression analysis used when the dependent variable is binary (i.e., has only two possible outcomes).

What is D-Error?
20 August 2018 | by Justin Yap

D-error is a measure that quantifies how good a design is at extracting information from repsondents. I'll show you how to compute D-error, Bayesian D-error and...

What is a Correlation Matrix?
16 August 2018 | by Tim Bock

A correlation matrix is a handy way to visualize correlation coefficients between sets of variables. A correlation matrix is also used as an input for more adva...

What is Deep Learning?
14 August 2018 | by Jake Hoare

Deep learning is a subset of machine learning. Like other machine-learning techniques, deep learning creates a mapping from input data to a target outcome.

What are the Different Types of Missing Data?
14 August 2018 | by Tim Bock

Missing data can be structurally missing, missing completely at random, mising at random, or nonignorable (also known as missing not at random).

What are the Alternatives to Random Sampling?
13 August 2018 | by Tim Bock

Alternatives to a random sample include quota samples, convenience samples, volunteer samples, purporsive samples, and snowball/referral samples.

Factor Analysis and Principal Component Analysis: A Simple Explanation
12 August 2018 | by Tim Bock

Factor analysis and principal component analysis identify patterns in the correlations between variables. They are used to identify underlying variables.

What is MaxDiff?
11 August 2018 | by Tim Bock

MaxDiff is a survey research technique for working out relative preferences for multiple items. It is also known as maximum difference or best-worst scaling.

What is Spurious Correlation?
10 August 2018 | by Tim Bock

Spurious Correlation is when two variables falsely appear to be causally related, normally due to an unseen, third factor.

What is Heteroscedasticity?
09 August 2018 | by Tim Bock

Heteroscedasticity is a specific type of pattern in the residuals of a model where the variability for a subset of the residuals is much larger.

What is the Replication Crisis?
08 August 2018 | by Tim Bock

The replication crisis is the growing belief that many scientific studies are unable to be reproduced. This could imply that significant theories are wrong.

Statistics vs Data Science: What’s the Difference?
07 August 2018 | by Tim Bock

Statistics is a mathematical field which deals with quantitative data. Data science is a multidisciplinary field which deals with data in a range of forms.

What is Data Stacking?
05 August 2018 | by Tim Bock

Data stacking is a way of organising data to find anomalies. Data stacking involves splitting a data set into smaller files and stacking the values for each var...

What are Residuals?
03 August 2018 | by Tim Bock

Residuals in statistics or machine learning are the difference between an observed data value and a predicted data value. They are also known as errors.

02 August 2018 | by Tim Bock

Metadata is data about data. This refers to not the data itself, but rather to any information that describes some aspect of the data.

What is a Decision Tree?
01 August 2018 | by Jake Hoare

A decision tree is a diagram that shows how to make a prediction based on a series of questions. The responses determines which branch is followed next.

What is Random Sampling?
11 July 2018 | by Tim Bock

In this post, I'll explain what random sampling is as well as all the different forms random sampling can occur in as well as an alternative to it.

What is Functional Data Analysis?
10 July 2018 | by Mathew McLean

Functional data analysis is a collection of methods for analyzing data over a curve, surface or continuum. Find out when to use it here.

What is Reproducible Research?
04 July 2018 | by Tim Bock

Research is reproducible when the exact results of a study can be reproduced given the original code, data and software. Find out the benefits of reproducible r...

What is Replicable Research and Why Should You Care?
04 July 2018 | by Tim Bock

Find out what replicable research is and why it is important for any study.

What is Missing Data and How to Handle It
03 July 2018 | by Tim Bock

What's the deal with missing data? In this post we'll explain what missing data is, why it is a problem and how you can handle it!

What is Rebasing?
18 June 2018 | by Tim Bock

Rebasing involves modifying a calculation by changing the sample (base) used in the calculation. Rebasing is commonly performed to remove ambiguous responses fr...

What is String Splitting?
29 May 2018 | by Daren Jackson

String splitting is the process of breaking up a text string in a systematic way, so that the individual parts of the text can be processed. For example, a time...

What is Correlation?
29 May 2018 | by Tim Bock

Correlation is usually defined as a measure of the linear relationship between two quantitative variables (e.g., height and weight). Often a slightly looser def...

What is Data Sorting?
18 May 2018 | by Tim Ali

Data sorting is any process that involves arranging the data into some meaningful order to make it easier to understand, analyze or visualize.

What is a Random Forest?
07 May 2018 | by Jake Hoare

A random forest is an ensemble of decision trees. Like other machine-learning techniques, random forests use training data to learn to make predictions.

What is Selection Bias?
13 April 2018 | by Tim Bock

Selection bias is an error in not ensuring random sampling. Learn more about the sources and examples of selection bias and how to avoid them.

What is a Distance Matrix?
12 April 2018 | by Tim Bock

A distance matrix is a table that shows the distance between pairs of objects. Learn more about Distance Matrices in this educational deep-dive.

What is Raw Data?
11 April 2018 | by Tim Bock

Raw data typically refers to data tables where rows contains observations and columns represent a variable that describes some property of each observation.

What is a P-Value?
11 April 2018 | by Tim Bock

A p-value is quantitative summary of the evidence in favor or against a hypothesis of interest. It is computed using a statistical test.

What are the Strengths and Weaknesses of Hierarchical Clustering?
10 April 2018 | by Tim Bock

What are the Strengths and Weaknesses of Hierarchical Clustering? Learn more about pros, cons and alternatives to Hierarchical Clustering.

What are Segmentation Variables?
10 April 2018 | by Tim Bock

Market segmentation typically involves forming groups of similar people. Segmentation variables are characteristics used to determine if they are similar.