Quote from: Wolfekeeper
But the best model of anything of course is the actual data.That is true - but tables of data are hard to read.
One way of retaining most of the data, and yet making it (somewhat) easy to read is to plot the cumulative distribution of travel times for the two days, on the same graph.
- That would show you that one day had very consistent travel times, but the second day had a bunch of delays.
- When you have a table of 5 data points on each day, it's not too hard to compare the tables of raw numbers
- But if you have thousands of points in each series, the cumulative distribution can be read much more easily
Quote from: scientizscht
One way to analyse this, is to take the average of day one and day two and find the difference. Would that be accurate given than the average may not be a good overall approximation for every distributionOne factor that makes it hard to compare is that you have a 3-hour "normal" flight time mixed in with a 2 hour "abnormal" flight delay.
- Different routes will have different "normal" flight times, making it harder to compare.
- As Alan says, if you are interested in delays, subtract out the scheduled flight time from the actual flight time.
- Then you can actually examine the delays.
If you don't really know what you are doing, taking the mean (=average) and the standard deviation is the safest bet.
- The mean takes into account the whole data set, and provides a compact summary of the "center" of the distribution. It converges quite quickly to the "correct" answer as it is an "unbiased estimator"
- The standard deviation also takes into account the whole data set, and provides a compact summary of the "spread" of the distribution. It converges fairly quickly.
- At this point you may remember from high school that there were two equations of standard deviation?
- One divided by the sample size (n)
- The other divided by (n-1)
- It doesn't matter so much for large sample sizes (thousands of data points), but for small sample sizes (like 5 airline flights), there is a bias in this estimator
Quote from: Sir Charles Dilke (maybe)
There are three kinds of lies: lies, damned lies, and statistics.There are many traps in statistics, and its better to understand a bit about the traps, so you can use statistics more effectively (or lie about them more effectively...)