What is.

What is Sampling Error?
07 November 2018 | by Tim Bock

Sampling error is the difference between the sample values and the true population values, which results from the use of random sampling.

Continue reading

What is a Latent Variable?
26 October 2018 | by Tim Bock

A latent variable is a variable that is inferred using models from observed data. These can be inferred through a wide range of approaches.

Continue reading

What is Feature Engineering?
25 October 2018 | by Tim Bock

Feature engineering is the process of selecting and transforming variables when creating a predictive model using machine learning or statistical modeling

Continue reading

What is Non-Sampling Error?
20 October 2018 | by Tim Bock

Non-sampling error refers to any deviation between the results of a survey and the truth which are not caused by the random selecting of observations.

Continue reading

How to Make a Geographic Map in Q
17 October 2018 | by Tim Ali

Geographic map visualizations are a great way to show comparative values across countries, states, or regions. I'll show you how to make one in Q.

Continue reading

What is Survey Data Processing?
15 October 2018 | by Oliver Harrison

Survey data processing is the manipulation or transformation of raw survey data into meaningful results which can be analyzed to answer a research question.

Continue reading

What is a Conversion Rate?
15 October 2018 | by Tim Bock

A conversion rate is the percentage of people that move from one stage to the next stage in the process. It is often used to identify weak spots in a company's ...

Continue reading

What is a Model?
14 October 2018 | by Tim Bock

A model is a usable description of how a system is believed to work. It is a simplification of reality, with unnecessary detail excluded.

Continue reading

open or closed ended survey questions
What is a Standard Error?
09 October 2018 | by Tim Bock

Standard error is the estimated standard deviation of the sampling distribution of a parameter. It quantifies the uncertainty around a parameter.

Continue reading

Price Sensitivity Meter Plot
What is the Price Sensitivity Meter?
26 September 2018 | by Tim Bock

A price sensitivity meter is a set of survey questions that is used to work out how to set prices for products. It works out a framework for what people conside...

Continue reading

What is Multidimensional Scaling (MDS)?
25 September 2018 | by Tim Bock

Multidimensional scalling (MDS) is a technique used to visualize the distance between objects when the distance between pairs of objects are known.

Continue reading

rubiks cube
What is the Chi-Square Test of Homogeneity?
24 September 2018 | by Tim Bock

The chi-square test of homogeneity tests to see whether different columns (or rows) of data in a table come from the same population or not.

Continue reading

What is a Column Chart?
23 September 2018 | by Tim Bock

A column chart is a data visualization where each category is represented by a rectangle with the height of the rectangle being proportional to the values.

Continue reading

What is Overplotting?
22 September 2018 | by Tim Bock

Overplotting is when the values or labels in a data visualization overlap, making the data visualization diffcult to read. Find out more.

Continue reading

What is a Bubble Chart?
21 September 2018 | by Tim Bock

A bubble chart is a data visualization that displays multiple circles (bubbles) in a two-dimensional plot. They can be used to show three variables.

Continue reading

text analytics vector
What is Effective Sample Size?
20 September 2018 | by Tim Bock

The effective sample size is an estimate of the sample size required to achieve the same level of precision if that sample was a simple random sample.

Continue reading

What is a Labeled Scatter Plot?
19 September 2018 | by Tim Bock

A labeled scatter plot is a data visualization that displays the values of two different variables, with text labels showing the meaning of each data point.

Continue reading

scatter plot
What is a Scatter Plot?
19 September 2018 | by Tim Bock

A scatter plot is a chart that displays the values of two variables as points. The data for each point is represented by its position on the chart.

Continue reading

What is A/B Testing?
18 September 2018 | by Tim Bock

A/B testing involves testing two different approaches to solving a problem - approach A or approach B and working out which is better according to the data.

Continue reading

What is the Chi-Square Test of Independence?
17 September 2018 | by Tim Bock

The chi-square test of independence tests to see whether there is a relationship between two categorical variables in a dataset.

Continue reading

What is R-Squared?
09 September 2018 | by Tim Bock

The R-Squared statistic quantifies the predictive accuracy of a statistical model. It is also known as the coefficient of determination and R².

Continue reading

What is the Chi-Square Frequency Test?
What is the Chi-Square Frequency Test?
08 September 2018 | by Tim Bock

The chi-square frequency test works out if two variable values are consistent with expectations and if a difference is statistically signficant.

Continue reading

What are Survey Quotas?
07 September 2018 | by Tim Bock

Survey quotas are the number of observations to meet a specified requirement. Learn about interlocking and non-interlocking survey quotas.

Continue reading

What is data filtering
What is Data Filtering?
04 September 2018 | by Chris Facer

Data filtering is the process of choosing a smaller part of your dataset and using that subset for viewng or analysis. It is usually temporary.

Continue reading

white measuring tape
What are Data Measurement Scales?
04 September 2018 | by Tim Bock

Data measurement scales are classifications which indicate the types of mathematical operations that can be performed on the data.

Continue reading

What is a Crosstab?
30 August 2018 | by Tim Bock

A crosstab is table showing the relationship between two or more variables. Crosstabs are useful for finding patterns and correlations in data.

Continue reading

What are Dummy Variables?
29 August 2018 | by Tim Bock

Dummy variables are variables that take values of 0 and 1, where the values indicate the presence or absence of something.

Continue reading

toy cars on white background
What are Small Multiples?
26 August 2018 | by Tim Bock

A small multiple is a data visualization that consists of multiple charts arranged in a grid. This makes it easy to compare the entirety of the data.

Continue reading

What is Logistic Regression?
22 August 2018 | by Justin Yap

Logistic regression is a type of regression analysis used when the dependent variable is binary (i.e., has only two possible outcomes).

Continue reading

D-error
What is D-Error?
20 August 2018 | by Justin Yap

D-error is a measure that quantifies how good a design is at extracting information from repsondents. I'll show you how to compute D-error, Bayesian D-error and...

Continue reading

What is a correlation Matrix?
What is a Correlation Matrix?
16 August 2018 | by Tim Bock

A correlation matrix is a handy way to visualize correlation coefficients between sets of variables. A correlation matrix is also used as an input for more adva...

Continue reading

What is Deep Learning?
14 August 2018 | by Jake Hoare

Deep learning is a subset of machine learning. Like other machine-learning techniques, deep learning creates a mapping from input data to a target outcome.

Continue reading

missing puzzle piece
What are the Different Types of Missing Data?
14 August 2018 | by Tim Bock

Missing data can be structurally missing, missing completely at random, mising at random, or nonignorable (also known as missing not at random).

Continue reading

What are the Alternatives to Random Sampling?
13 August 2018 | by Tim Bock

Alternatives to a random sample include quota samples, convenience samples, volunteer samples, purporsive samples, and snowball/referral samples.

Continue reading

Factor Analysis and Principal Component Analysis: A Simple Explanation
12 August 2018 | by Tim Bock

Factor analysis and principal component analysis identify patterns in the correlations between variables. They are used to identify underlying variables.

Continue reading

What is MaxDiff?
What is MaxDiff?
11 August 2018 | by Tim Bock

MaxDiff is a survey research technique for working out relative preferences for multiple items. It is also known as maximum difference or best-worst scaling.

Continue reading

what is spurious correlation
What is Spurious Correlation?
10 August 2018 | by Tim Bock

Spurious Correlation is when two variables falsely appear to be causally related, normally due to an unseen, third factor.

Continue reading

Heteroscedasticity banner
What is Heteroscedasticity?
09 August 2018 | by Tim Bock

Heteroscedasticity is a specific type of pattern in the residuals of a model where the variability for a subset of the residuals is much larger.

Continue reading

What is the Replication Crisis?
08 August 2018 | by Tim Bock

The replication crisis is the growing belief that many scientific studies are unable to be reproduced. This could imply that significant theories are wrong.

Continue reading

Statistics vs Data Science: What’s the Difference?
07 August 2018 | by Tim Bock

Statistics is a mathematical field which deals with quantitative data. Data science is a multidisciplinary field which deals with data in a range of forms.

Continue reading

What is Data Stacking?
05 August 2018 | by Tim Bock

Data stacking is a way of organising data to find anomalies. Data stacking involves splitting a data set into smaller files and stacking the values for each var...

Continue reading

Chart explaining Residuals
What are Residuals?
03 August 2018 | by Tim Bock

Residuals in statistics or machine learning are the difference between an observed data value and a predicted data value. They are also known as errors.

Continue reading

code on a screen
What is Metadata?
02 August 2018 | by Tim Bock

Metadata is data about data. This refers to not the data itself, but rather to any information that describes some aspect of the data.

Continue reading

What is a Decision Tree?
01 August 2018 | by Jake Hoare

A decision tree is a diagram that shows how to make a prediction based on a series of questions. The responses determines which branch is followed next.

Continue reading

random sample people microscope
What is Random Sampling?
11 July 2018 | by Tim Bock

In this post, I'll explain what random sampling is as well as all the different forms random sampling can occur in as well as an alternative to it.

Continue reading

functional data analysis
What is Functional Data Analysis?
10 July 2018 | by Mathew McLean

Functional data analysis is a collection of methods for analyzing data over a curve, surface or continuum. Find out when to use it here.

Continue reading

What is Reproducible Research?
04 July 2018 | by Tim Bock

Research is reproducible when the exact results of a study can be reproduced given the original code, data and software. Find out the benefits of reproducible r...

Continue reading

What is Replicable Research and Why Should You Care?
04 July 2018 | by Tim Bock

Find out what replicable research is and why it is important for any study.

Continue reading

What is Missing Data and How to Handle It
03 July 2018 | by Tim Bock

What's the deal with missing data? In this post we'll explain what missing data is, why it is a problem and how you can handle it!

Continue reading

What is Rebasing?
18 June 2018 | by Tim Bock

Rebasing involves modifying a calculation by changing the sample (base) used in the calculation. Rebasing is commonly performed to remove ambiguous responses fr...

Continue reading

What is String Splitting?
What is String Splitting?
29 May 2018 | by Daren Jackson

String splitting is the process of breaking up a text string in a systematic way, so that the individual parts of the text can be processed. For example, a time...

Continue reading

What is Correlation?
29 May 2018 | by Tim Bock

Correlation is usually defined as a measure of the linear relationship between two quantitative variables (e.g., height and weight). Often a slightly looser def...

Continue reading

What is Data Sorting?
18 May 2018 | by Tim Ali

Data sorting is any process that involves arranging the data into some meaningful order to make it easier to understand, analyze or visualize.

Continue reading

What is a Random Forest?
What is a Random Forest?
07 May 2018 | by Jake Hoare

A random forest is an ensemble of decision trees. Like other machine-learning techniques, random forests use training data to learn to make predictions.

Continue reading

What is Selection Bias?
What is Selection Bias?
13 April 2018 | by Tim Bock

Selection bias is an error in not ensuring random sampling. Learn more about the sources and examples of selection bias and how to avoid them.

Continue reading

What is a Distance Matrix?
12 April 2018 | by Tim Bock

A distance matrix is a table that shows the distance between pairs of objects. Learn more about Distance Matrices in this educational deep-dive.

Continue reading

What is Raw Data?
11 April 2018 | by Tim Bock

Raw data typically refers to data tables where rows contains observations and columns represent a variable that describes some property of each observation.

Continue reading

What is a P-Value?
What is a P-Value?
11 April 2018 | by Tim Bock

A p-value is quantitative summary of the evidence in favor or against a hypothesis of interest. It is computed using a statistical test.

Continue reading

What are the Strengths and Weaknesses of Hierarchical Clustering?
What are the Strengths and Weaknesses of Hierarchical Clustering?
10 April 2018 | by Tim Bock

What are the Strengths and Weaknesses of Hierarchical Clustering? Learn more about pros, cons and alternatives to Hierarchical Clustering.

Continue reading

What are Segmentation Variables?
What are Segmentation Variables?
10 April 2018 | by Tim Bock

Market segmentation typically involves forming groups of similar people. Segmentation variables are characteristics used to determine if they are similar.

Continue reading