Now, you have collected quantitative data on different variables (things you are measuring), you need to ask yourself: is there a link between these variables? Are they related? The link between variables is called ‘correlation’.
Definition
Correlation is a “statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate)” (JMP Statistical Discovery 2023). Note: Correlation does not show if one variable causes the other. |
---|
Let’s use an example to illustrate what we mean by correlation. Imagine, you have two sets of data. One set shows how much time you spend on your phone each day for a week. The other set shows how anxious you felt each day. You want to know if there is a link between the two. If people spend more time on their phones, does their anxiety increase or decrease?
To find out what the correlation is between variables, you carry out ‘correlation analysis’.
Definition
Correlation analysis is a “statistical method that is used to discover if there is a relationship between two variables/datasets, and how strong that relationship may be” (James 2021). Note: In this method, you use mathematical tools to investigate what patterns exist between your variables. |
---|
Correlation analysis allows you to establish what is called a ‘correlation coefficient’. This is a number between –1 and 1 and is most commonly referred to as ‘r’. The value of ‘r’ tells you the nature of the link between your variables. Specifically, ‘r’ gives us two things:
- the direction of the link
- the strength of the link
Let’s discuss each in more detail.
1. Direction of the Link
Direction shows how your variables are connected. If one variable increases, does the other increase or decrease? Or is there no effect? In other words, is the link between your variables positive (upwards), negative (downwards) or non-existent?
If the correlation coefficient (r) is close to 1, it is a positive correlation. This means when one variable goes up, the other also goes up.
- Example: If you spend more time on your phone, you might feel more anxious. If you spend less time on your phone, you might feel less anxious. This correlation can be visualised like this:
If the correlation coefficient (r) is close to -1, it is a negative correlation. This means when one variable goes up, the other goes down.
- Example: If you spend more time on your phone, you might feel less anxious. If you spend less time on your phone, you might feel more anxious. This correlation can be illustrated like this:
If the correlation coefficient (r) is 0, there is no clear connection between the two variables. They do not seem to change together. This means when one variable goes up, there is no effect on the other.
- Example: If you spend more time on your phone, it has no effect on your feelings of anxiety. This can be shown on a graph like this:
2. Strength of the link
Strength shows to what extent your variables are connected. If one variable increases, how responsive is the other variable to that increase? Another way to describe this is, “How good would a straight line be at describing your data?” (Benedict 2014).
The closer the correlation coefficient (r) is to -1 or 1, the stronger the connection between the two variables. If it is exactly -1 or 1, it is a strong, ‘perfect’ connection. This means “a change in one variable is accompanied by a perfectly consistent change in the other” (Frost 2023).
- Example: The correlation coefficient (r) between ‘hours spent on phone’ and ‘level of anxiety’ is 1. This means that as the amount of ‘hours spent on phone’ increases, ‘level of anxiety’ will always increase (at a constant rate!) for each extra ‘hour spent on phone’.
So, if ‘hours spent on phone’ increases from 1 to 2, the ‘level of anxiety’ could increase from 5 to 7. This would mean that if ‘hours spent on phone’ increased from 2 to 3, the ‘level of anxiety’ must increase from 7 to 9.
If the correlation coefficient (r) is closer to 0, like 0.3 or -0.3, it is a weak connection. A change in one variable means a less consistent change in the other.
- Example: The correlation coefficient (r) between ‘hours spent on phone’ and ‘level of anxiety’ is -0.3. This means that as the amount of ‘hours spent on the phone‘ increases, the ‘level of anxiety’ will usually (but not always!) decrease for each extra ‘hour spent on phone’.
So, if ‘hours spent on phone’ increases from 1 to 2, the ‘level of anxiety’ could decrease from 7 to 5. Then, if ‘hours spent on phone’ increased from 2 to 3, the ‘level of anxiety’ could then change in a few different ways, including: decreasing from 5 to 4, decreasing from 5 to 3, increasing from 5 to 6, or even staying the same at 5. However, the ‘level of anxiety’ is still most likely to decrease.
The above points are reflected in the scatter plots below (Robertson 2023). They show that the closer the points are to the straight line, the stronger the correlation.
Summary
- The purpose of correlation analysis is “to increase our understanding of how different variables are related and to identify patterns in those relationships” (Mcleod 2023).
- Through correlational analysis, we obtain a correlation coefficient ‘r’. This is a value that goes from –1 to 1. It tells us the direction and strength of the relationship between our two variables.
To learn more about correlation analysis, please see the resources below.
(Author: Julia Mathews)
What is it?
Videos:
Correlation – the basic idea explained by Benedict (2014)
This video is an introduction to correlation. It guides you on how to read scatter plots. Then, using these plots, it explains what the correlation coefficient is. It also talks about the limits of correlation analysis.
(Academic reference: Benedict. (2014, April 11). Correlation – the basic idea explained [Video]. YouTube https://youtu.be/qC9_mohleao)
Articles:
Interpretation of correlation in clinical research by Man Hung, Jerry Bounsanga, & Maren Wright Voss (2018)
This is an advanced article. It explains what correlation is, how to use it, and how to interpret it. It is helpful because it shows why knowing more about correlation can lead to better research results.
(Academic reference: Hung, M., Bounsanga, J., & Voss, M. W. (2017). Interpretation of correlations in clinical research. Postgraduate Medicine, 129(8), 902–906. https://doi.org/10.1080/00325481.2017.1383820)
Books:
Correlational research by Paul C. Price (2017)
In section 6.2 of this book, the authors explain what correlation research is. Why carry out correlation analysis rather than other types of analysis? What are correlation coefficients? What is the difference between correlating and causing? These questions are answered in depth. There are examples, figures and links to further resources.
(Academic reference: Price, P. C., Jhangiani, R. S., Chiang, A. I., Leighton, D. C., & Cutter, C. (2017). 6.2 Correlational research. Research methods in psychology (3^{rd} ed.). https://opentext.wsu.edu/carriecuttler/chapter/correlational-research/)