Naked Science Forum

Non Life Sciences => Physics, Astronomy & Cosmology => Topic started by: jinjon on 24/05/2022 16:36:47

Title: Correlation vs association
Post by: jinjon on 24/05/2022 16:36:47
Hi,

I am wondering if it is a mistake to write the results of linear regression and logistic regression analysis as that the variables that you have analysed: do or do not correlate with each other.
Or is it wrong to say that they are correlated to each other and you should say that there is an association ( or no association) between the variables?

For example a is the independent variable, b is the dependent variable. You apply linear regression or logistic regression and report the results as:

a is correlatad with b or not correlated with b...

Hope you understand what I mean.

Title: Re: Correlation vs association
Post by: evan_au on 24/05/2022 23:35:29
Quote from: OP
a is the independent variable, b is the dependent variable
I know that this is normal terminology, but it implies a causal direction.
- If there is some variable in your experiment that you can easily control, and another variable that you can easily measure, then it is fair to say that "when I changed variable x, variable y changed in a (linear/parabolic/exponential) manner"

However, when it comes to complex things like the impact of obesity in a human population on heart attacks:
- There is no easy way to control obesity in a whole population
- There is no easy way to control heart attacks in a whole population
- There are many factors which can cause heart attacks (eg genetics, congenital problems, education on exercise, stress)
- There are many factors which can cause obesity (eg genetics, income, education on healthy diet, stress)
- So the easiest thing to do is to do some sort of scatterplot of obesity vs age of first heart attack
- Then do a regression line through it, to conclude that  "with increased variable x, variable y changes in a (linear/parabolic/exponential) manner"
- You could hypothesize that obesity contributes to heart attacks (since the obesity was present before the first heart attack), but it's not guaranteed: Someone who has an underlying heart condition may be predisposed to a sedentary lifestyle, which may make them obese.
- You could make comments like "For patients with BMI > 30, a weight reduction of 1 kg is associated with a delay of z years in age of first heart attack."
Title: Re: Correlation vs association
Post by: Eternal Student on 25/05/2022 00:56:18
Hi.

    Good general discussion from @evan_au above.

is it wrong to say that they are correlated to each other
    No it's not "wrong" it's just a bit dangerous or could be misunderstood.
    Essentially it depends on your target audience  -  the people who you expect to read your statements.

The phrase  "X and Y are uncorrelated" has a precise meaning to a Statistician or Mathematician.   It means precisely r(X,Y) = 0    (the correlation coefficient = 0)  and nothing more.    They won't jump to any other conclusions, in particular they won't assume that X and Y are completely unrelated or independent variables.  They know that X could still be entirely determined by Y, they just aren't linearly related.

The phrase "X and Y are correlated"   would just mean that r(x,Y) equals anything else other than 0.   To be honest, that's a rare phrase to use for statisticians.  It would be more common to take more lines and state that r(X,Y) cannot be zero but its not clear that a linear relationship exists  or else just leave it written in symbols   r(X,Y) ≠ 0.    If you did leave that phrase "X and Y are correlated" as if it was some sort of final conclusion then they might reasonably assume you meant that  X and Y are strongly correlated,    or  that   |r(x,Y)| ≈ 1.   To say that in plain English - they might assume that X is (or is almost entirely explained by) a linear function of Y.

    If your target audience is not a group of statisticians,  then you "know" that when people hear the words   "correlated"  or  "uncorrelated"   they will jump to conclusions about whether X and Y are independent or unrelated.   They might make even bigger jumps than that and assume one thing is actually the cause of the other.    So if your target audience isn't a group of statisticians, then you really must do as @evan_au  suggested and choose your phrases more carefully.

Best Wishes.
Title: Re: Correlation vs association
Post by: jinjon on 27/05/2022 16:01:27
Thank you guys for the help!