The Naked Scientists
  • Login
  • Register
  • Podcasts
      • The Naked Scientists
      • eLife
      • Naked Genetics
      • Naked Astronomy
      • In short
      • Naked Neuroscience
      • Ask! The Naked Scientists
      • Question of the Week
      • Archive
      • Video
      • SUBSCRIBE to our Podcasts
  • Articles
      • Science News
      • Features
      • Interviews
      • Answers to Science Questions
  • Get Naked
      • Donate
      • Do an Experiment
      • Science Forum
      • Ask a Question
  • About
      • Meet the team
      • Our Sponsors
      • Site Map
      • Contact us

User menu

  • Login
  • Register
  • Home
  • Help
  • Search
  • Tags
  • Recent Topics
  • Login
  • Register
  1. Naked Science Forum
  2. Non Life Sciences
  3. Geek Speak
  4. What is the power of regression?
« previous next »
  • Print
Pages: [1] 2   Go Down

What is the power of regression?

  • 26 Replies
  • 16647 Views
  • 1 Tags

0 Members and 2 Guests are viewing this topic.

Offline scientizscht (OP)

  • Naked Science Forum King!
  • ******
  • 1006
  • Activity:
    0%
  • Thanked: 3 times
  • Naked Science Forum
What is the power of regression?
« on: 04/06/2021 19:36:08 »
Hello

Regression is used to elucidate the relationship between two factors.

If we simply graph those two factors we can see the line and identify their relationship.

What is so special about regression that is so widely used then?
Logged
 



Online Bored chemist

  • Naked Science Forum GOD!
  • *******
  • 31103
  • Activity:
    9.5%
  • Thanked: 1291 times
Re: What is the power of regression?
« Reply #1 on: 04/06/2021 20:08:08 »
Quote from: scientizscht on 04/06/2021 19:36:08
If we simply graph those two factors we can see the line and identify their relationship.
Really?
What line do you draw through these?

* dots.JPG (36.06 kB . 481x399 - viewed 8888 times)
Logged
Please disregard all previous signatures.
 

Offline evan_au

  • Global Moderator
  • Naked Science Forum GOD!
  • ********
  • 11036
  • Activity:
    9.5%
  • Thanked: 1486 times
Re: What is the power of regression?
« Reply #2 on: 05/06/2021 10:17:46 »
All measurements have sources of error.
- Least-squares regression was originally used to estimate the path of comets from astronomical measurements, even though those measurements had errors.
- The recorded measurements were inconsistent with each other, and yielded no valid answer if you took them "as-is".

Quote
If we simply graph those two factors we can see the line and identify their relationship.
But two people are likely to draw different lines.
- "Least-squares" regression software is able to draw a line that minimises the errors according to the well-known "Least-squares" criterion (which has limitations that are not-so-well known by most people who use it)
- Most Regression packages are able to calculate an R2 measure that gives an idea of how well the data fits the line.
- Sometimes the data is not a straight line, and most regression packages allow fitting parabolas, exponentials or other functions - but you had better have a good rationale for the fitted function.

Warnings
- Beware of over-fitting the data; by using a high-dimensional polynomial, you can exactly fit the curve to the data. This just means you are paying more attention to the errors than the bulk of the data.
- Correlation does not prove causation. Just because two variables seem to have a relationship does not mean that one causes the other. You need more evidence than just a regression line (like experimentally varying one parameter, and see how the other one changes).
- Don't keep picking sets of variables until you find a set that meets some arbitrary criterion like R2>0.95. If you try 100 sets of variables (like cheese consumption and risk of premature death), you will eventually find sets that seem to have a high correlation.
- The most important test is how well it predicts results that have not been observed.
      - This is often used in AI systems; present half the data as training data, then see how well it works on the half that wasn't included in the training data.
      - Extrapolate beyond the training data. How well does the regression trendline predict the results outside the original range of data?

See: https://en.wikipedia.org/wiki/Regression_analysis

Quote from: BC
What line do you draw through these?
Funny enough, there are some measurements at my work that look quite like this.
- For me, it happens when you have a uniformly distributed deviation on top of a linear measurement.
- To my eye, this one looks something close to y=x/2 + 100*RAND(0,1)
- For which a regression package would give (approximately) y = x/2 + 50
- Ironically, this data sample would have a poor R2, but would still be a decent line of best fit, simply because (over the measured range), the error is of similar magnitude to the trend. The R2 value would improve if you extended the data out to x=1000 (or 1 million), assuming it was defined over this range...
Logged
 

Online Bored chemist

  • Naked Science Forum GOD!
  • *******
  • 31103
  • Activity:
    9.5%
  • Thanked: 1291 times
Re: What is the power of regression?
« Reply #3 on: 05/06/2021 11:14:04 »
Quote from: evan_au on 05/06/2021 10:17:46
To my eye, this one looks something close to y=x/2 + 100*RAND(0,1)
Close, but it's got a product of 3 random numbers in it, to make it a bit more "random"- or, at least, a bit more like a normal distribution.
The Rand function gives a square  distribution.
Adding two of them gives a triangular distribution (like the sum of two dice).
If I remember a stats course I did 30 years ago,  a distribution like that is better represented by a "least linear distance" fit, rather than a "least squares" fit.

Regression analysis's should be more than just "stuff it into Excel"- though that's what most people seem to do.
It helps to have an idea of what the data "should" look like, but also you shouldn't constrain the analysis too much.


Logged
Please disregard all previous signatures.
 

Offline alancalverd

  • Global Moderator
  • Naked Science Forum GOD!
  • ********
  • 21167
  • Activity:
    61%
  • Thanked: 61 times
  • Life is too short for instant coffee
Re: What is the power of regression?
« Reply #4 on: 05/06/2021 13:20:30 »
"If in doubt, plot log/log and draw a straight line"  Not my words, but I've heard it too often to ignore!
Logged
Helping stem the tide of ignorance
 



Offline Eternal Student

  • Naked Science Forum King!
  • ******
  • 1832
  • Activity:
    7.5%
  • Thanked: 470 times
Re: What is the power of regression?
« Reply #5 on: 06/06/2021 00:14:49 »
Hi.

   No one has mentioned that regression can do a bit more than just fit a line (or curve) to some data.
After fitting the line (or curve), the residuals have a distribution of their own.  Linear regression is most powerfull when the residuals have a Normal distribution and if you're a statistician this is usually one of the most important things you want to obtain (not a line of best fit).   Once you have residuals with a Normal distribution the door is open to some powerfull prediction and interpolation.   
    Meanwhile, if you just draw a line of best fit all you have is an indication of some relation between two variables but very little quantitative prediction capability.
Logged
 

Online Bored chemist

  • Naked Science Forum GOD!
  • *******
  • 31103
  • Activity:
    9.5%
  • Thanked: 1291 times
Re: What is the power of regression?
« Reply #6 on: 06/06/2021 10:34:03 »
It's also important to recognise that you can do a multi variate regression line to assess how some variable  changes with a number of other factors.
You might be able to "just draw a line" through the data, but it's very hard to "just" draw a hypersurface through it.
Logged
Please disregard all previous signatures.
 

Offline evan_au

  • Global Moderator
  • Naked Science Forum GOD!
  • ********
  • 11036
  • Activity:
    9.5%
  • Thanked: 1486 times
Re: What is the power of regression?
« Reply #7 on: 06/06/2021 10:48:14 »
Quote from: Eternal Student
Linear regression is most powerfull when the residuals have a Normal distribution
The irony is that when you have a normal distribution, you have just proved to yourself that you know very little about what you are measuring.

As BC indicated, if you add a large number of number of distinctly non-normal distributions, you will get something that looks like a Normal distribution.
- This includes Uniform, negative exponential, and even discrete distributions
- This is a result of the "Central Limit Theorem"*
- If you want to show that you really understand the process, you need to isolate those underlying distributions, and explain how they combined to form the Normal distribution.
See: https://en.wikipedia.org/wiki/Central_limit_theorem

*When I was doing introductory statistics at university, we were introduced to the Central Limit Theorem as being fundamental to all of statistics
- The class clown asked "When doesn't it apply?"
- The lecturer eventually conceded that there were some theoretical distributions that didn't necessarily obey the Central Limit theorem - they had strange properties like a non-finite standard deviation; but don't worry, they can never occur in real life!
- Around 1995, I discovered that internet traffic has a non-finite standard deviation, and I realised that it was time for me to learn another branch of statistics (I work in telecommunications...).
- But in most cases, using a large (but finite) standard deviation gives results that are "close enough" after you apply a bit of safety margin
« Last Edit: 07/06/2021 10:45:23 by evan_au »
Logged
 

Online Bored chemist

  • Naked Science Forum GOD!
  • *******
  • 31103
  • Activity:
    9.5%
  • Thanked: 1291 times
Re: What is the power of regression?
« Reply #8 on: 06/06/2021 11:32:48 »
One relatively simple distribution which does not have a mean and standard deviation is the ratio of two independent normally distributed variables.
You can use this knowledge to upset statisticians.

People- notable statisticians- use the normal distribution because it has well defined properties.
They then use the CLT to "justify" using it, on the basis that, if you look at lost of sub samples, everything looks like a normal distribution.

However, if you know that the distribution isn't normal- for example, rolling dice- you should actually use the analyses for the correct distribution.


A long time ago (before catalytic converters were common) a colleagues of mine was doing some sampling for common air pollutants- benzene, toluene and xylene. They are strongly associated with vehicle  emissions.

He got a bunch of us to hang samplers in our gardens- one near the road and the other at the bottom of the garden (and thus not near the road).
He ended up with ten pairs of data points.
And he analysed them on the basis that they were normally distributed.
So he got a mean and SD for "front" and "back". They overlapped considerably so he came to the conclusion that there was no statistically significant difference between the front and back garden samples.

And then I pointed out that, in 9 cases out of 10, the back garden concentration was lower than the front garden.
If he had been right about there being no difference then that would have been equivalent to tossing a coin 10 times and getting 9 heads. The odds for that are something like 100 to 1 against.
He rewrote that bit, but didn't give me the credit...

Logged
Please disregard all previous signatures.
 



Offline Eternal Student

  • Naked Science Forum King!
  • ******
  • 1832
  • Activity:
    7.5%
  • Thanked: 470 times
Re: What is the power of regression?
« Reply #9 on: 06/06/2021 21:24:27 »
Hi all.

Quote from: evan_au on 06/06/2021 10:48:14
The irony is that when you have a normal distribution, you have just proved to yourself that you know very little about what you are measuring.
   I know what you are trying to say.  This sentence on it's own isn't particularly true or helpful to the OP but it is interesting and ironic as you say.  The OP needs to know that we have a useable statistical model.  The uncertainty or randomness has been contained and modelled. 
    Explaining the uncertainty or randomness in the residuals is a separate project or problem, which you (the scientist) may or may not want to do more work on.  As you (Evan-au) have stated, there is reason to think this could actually be very difficult if the residuals are Normally distributed.

    B_C  then made some comments about using other distributions.  Yes, that's fine and (I expect you know) it is done but this produces a non-standard linear regression model.  It's again interesting and I'm happy to discuss it but it may not be useful to the OP.
    While we're on the topic, B_C mentioned using non-parametric tests.   This might be worth adding to....   Linear regression is closely related to obtaining correlation coefficients, this is easily applied to non-parametric data and gives us techniques like Spearman's rank correlation tests.

   Anyway, the main point is that by formalising the procedure of finding a line of best fit, we generate statistics (numerical quantities) which can be put to many uses and for which there are well established and powerfull techniques but drawing a line of best fit (by eye) doesn't give us anything other than a rough visual guide.
Logged
 

Offline Colin2B

  • Global Moderator
  • Naked Science Forum King!
  • ********
  • 6476
  • Activity:
    0%
  • Thanked: 708 times
Re: What is the power of regression?
« Reply #10 on: 06/06/2021 23:44:19 »
Quote from: Eternal Student on 06/06/2021 21:24:27
This sentence on it's own isn't particularly true or helpful to the OP but it is interesting and ironic as you say.  The OP needs to know that we have a useable statistical model. 
It is very difficult to know what the OP needs. He appears here at intervals with seemingly disconnected questions, with no context, and often no response to requests for further information on his application. Who knows whether it’s helpful as we rarely get feedback.
This sort of question is typical: https://www.thenakedscientists.com/forum/index.php?topic=81458.msg625626#msg625626.

Logged
and the misguided shall lead the gullible,
the feebleminded have inherited the earth.
 
The following users thanked this post: Zer0

Offline Eternal Student

  • Naked Science Forum King!
  • ******
  • 1832
  • Activity:
    7.5%
  • Thanked: 470 times
Re: What is the power of regression?
« Reply #11 on: 07/06/2021 00:50:00 »
Hi Colin2B,

    Well, we've got to take the optimistic view.  The OP has become very busy and will check the responses when time allows;  or they have a limited internet connection each month;   or they have other health issues.
     So you've just helped by answering a question and engaging in conversation with a person who has poor health, limited funds and a stressful life.  You've done a good thing Colin2B, other moderators and regulars.
Logged
 
The following users thanked this post: Zer0

Offline vhfpmr

  • Hero Member
  • *****
  • 723
  • Activity:
    3%
  • Thanked: 75 times
Re: What is the power of regression?
« Reply #12 on: 08/06/2021 17:21:39 »
Anyone care to speculate what the law is here? Looks more pareidolia than parabola.  ;D ;D

* Scatter.png (9.74 kB, 284x178 - viewed 666 times.)
Logged
 



Offline evan_au

  • Global Moderator
  • Naked Science Forum GOD!
  • ********
  • 11036
  • Activity:
    9.5%
  • Thanked: 1486 times
Re: What is the power of regression?
« Reply #13 on: 09/06/2021 11:12:52 »
Quote from: vhfpmr
Anyone care to speculate what the law is here?
It would help if you labelled the axes....

I can see that there is a minimum value for the Y-Axis, and a narrow band of very common values on the Y-Axis that are fairly independent of the X-Axis...
Logged
 

Offline vhfpmr

  • Hero Member
  • *****
  • 723
  • Activity:
    3%
  • Thanked: 75 times
Re: What is the power of regression?
« Reply #14 on: 09/06/2021 11:48:43 »
Quote from: evan_au on 09/06/2021 11:12:52
It would help if you labelled the axes....
Yes, I know, I posted it more out of humour than as a serious question, just because I happened to have a bemusing scattergram at the time when a regression thread was running.
Both axes are standard deviation, plotted because I was curious whether more variation in one parameter might indicative of more variation in the other.
Logged
 

Offline charles1948

  • Hero Member
  • *****
  • 713
  • Activity:
    0%
  • Thanked: 41 times
  • Naked Science Forum Newbie
Re: What is the power of regression?
« Reply #15 on: 09/06/2021 19:48:34 »
On the subject of "graphs", aren't they a kind of analogue device - something like the old "slide-rules" that we had before modern digital calculators and computers

I suppose no-one in modern Science would try to use images of a "slide-rule" to present evidence in support of a scientific theory, so why are "graphs" still employed.

Couldn't the evidence in a graph, be presented in "digital" format, ie as just tables of numbers?  From which conclusions could be arrived at.

Is it because graphs make it easier to see the underlying mathematical processes. If so, why did we abandon slide-rules so quickly?








.



Logged
Science is the ancient dream of Magic come true
 

Online Bored chemist

  • Naked Science Forum GOD!
  • *******
  • 31103
  • Activity:
    9.5%
  • Thanked: 1291 times
Re: What is the power of regression?
« Reply #16 on: 09/06/2021 19:55:42 »
Quote from: charles1948 on 09/06/2021 19:48:34
why did we abandon slide-rules so quickly?
Because they are a bit rubbish.
Very tricky to get an accurate answer from one.
Logged
Please disregard all previous signatures.
 



Offline charles1948

  • Hero Member
  • *****
  • 713
  • Activity:
    0%
  • Thanked: 41 times
  • Naked Science Forum Newbie
Re: What is the power of regression?
« Reply #17 on: 09/06/2021 20:26:16 »
Yes BC, your remark reminds me of something Arthur C Clarke wrote in one of his old books.

It went something like:

If you ask the average person: What's 8 divided by 2 - he'll instantly say: " It's 4."

If you ask a scientist, or engineer, the same question, he'll get out a slide-rule, fiddle about with it for a while, then say: "It's between 3.9 and 4.1"

 BTW, apologies for not using gender-neutral pronouns, but it was an old book.





Logged
Science is the ancient dream of Magic come true
 

Online Bored chemist

  • Naked Science Forum GOD!
  • *******
  • 31103
  • Activity:
    9.5%
  • Thanked: 1291 times
Re: What is the power of regression?
« Reply #18 on: 09/06/2021 20:30:23 »
If you got enough engineers and asked them enough questions of the form "What's x divided by 2" and plotted the answers they gave vs x then you could do a regression analysis on that data and model the case of x= 8.
You should get an answer that is closer to 4 than "3.9 to 4.1".
Logged
Please disregard all previous signatures.
 

Offline Colin2B

  • Global Moderator
  • Naked Science Forum King!
  • ********
  • 6476
  • Activity:
    0%
  • Thanked: 708 times
Re: What is the power of regression?
« Reply #19 on: 09/06/2021 23:05:50 »
Quote from: charles1948 on 09/06/2021 19:48:34
On the subject of "graphs", aren't they a kind of analogue device - something like the old "slide-rules" that we had before modern digital calculators and computers
Your world is full of functional and very effective analogue devices.
Stop trolling.
Logged
and the misguided shall lead the gullible,
the feebleminded have inherited the earth.
 



  • Print
Pages: [1] 2   Go Up
« previous next »
Tags: regression 
 
There was an error while thanking
Thanking...
  • SMF 2.0.15 | SMF © 2017, Simple Machines
    Privacy Policy
    SMFAds for Free Forums
  • Naked Science Forum ©

Page created in 0.589 seconds with 80 queries.

  • Podcasts
  • Articles
  • Get Naked
  • About
  • Contact us
  • Advertise
  • Privacy Policy
  • Subscribe to newsletter
  • We love feedback

Follow us

cambridge_logo_footer.png

©The Naked Scientists® 2000–2017 | The Naked Scientists® and Naked Science® are registered trademarks created by Dr Chris Smith. Information presented on this website is the opinion of the individual contributors and does not reflect the general views of the administrators, editors, moderators, sponsors, Cambridge University or the public at large.