The Naked Scientists

The Naked Scientists Forum

Author Topic: how to solve the following linear curve fitting problem in the minimization cont  (Read 5399 times)

Offline engrByDayPianstByNight

  • Sr. Member
  • ****
  • Posts: 105
    • View Profile
Hello,

    First off, I apologize for posting this math question because I know this is not the right forum for it. But I appreciate it if anybody could help me with it. Or if you know of another forum more appropriate for this question, please let me know.

    Let's first limit ourselves to only the first quadrant of a Cartesian coordinate system. I have a point G that's located on the x-axis, and it's position (x_G, 0) is known (x_G>0). Now, I have four points A, B, C, and D in this quadrant, and I can measure the four distances GA, GB, GC, and GD.

    The four distance measurements each contain some errors. But if error-free (as shown in the attached figure below; I accidentally posted it twice), they're supposed to be all on a straight line, i.e., a linear trajectory (i.e., the line can be described as y=ax+b). Moreover, these four distances are periodically sampled, meaning that (in the error-free case) AB=BC=CD. And, just for the sake of the argument, let's say GA>GB>GC>GD. However, I do not know the coordinates of these four points in the Cartesian system (denoted as (x_A, y_A), (x_B, y_B), (x_C, y_C), (x_D, y_D), respectively).

     I want to find the slope of the linear trajectory in the Cartesian system. The way I'm thinking about doing is to cast this as a minimization problem to first find the points' coordinates on the linear trajectory (denote them as (x'_A, y'_A), (x'_B, y'_B), (x'_C, y'_C), (x'_D, y'_D)). That way, the slope can be estimated to be = (y'_A - y'_B) / (x'_A - x'_B).

    However, I'm not familiar with formulating the minimization (what should be the objective function, and what should be the constraint function). Can anyone help me with this? Thanks in advance.


 

Offline DonBrown

  • Jr. Member
  • **
  • Posts: 18
    • View Profile
I don't know much about the question you ask, but looking at your problem I'd say it was insoluble.
If I have read you correctly, the only data you have is the lengths of the lines AG, BG, CG, DG. You know nothing about their direction (other than the constraint that A,B,C,D are colinear & equidistant.)
Assume your diagram represents the true or best estimate trajectory based on your data, by any method you like, even magic.
Then the trajectory can be rotated about point G, from the position where AG lies on the x axis to the position where GD lies on the x axis (since the points all lie in Q1) without altering any of the specified lengths.

The best you MIGHT do is to find the distance of G from the line ABCD and the distance AB=BC=CD. Perhaps this is all you wanted?
 

Offline RD

  • Neilep Level Member
  • ******
  • Posts: 8126
  • Thanked: 53 times
    • View Profile
If you're trying to fit a line to the data you could try the "least squares" method...
http://mathworld.wolfram.com/LeastSquaresFitting.html
« Last Edit: 24/08/2008 11:21:20 by RD »
 

Offline DonBrown

  • Jr. Member
  • **
  • Posts: 18
    • View Profile
I've now tried this using OOCalc and I can find the distance between the points A,B,C & D and the distance from G to the line.
I did this by drawing some diagrams like yours and using the measured distances for the 4 points.
The table I used calculated the distances AB, BC, CD from an assumed position of D on the line.
I assumed at first that D was the point of closest approach to G, then increased the assumed offset of D from this point.
At each position I calculated the average of AB, BC & CD and the RMS error.
I had the advantage of knowing roughly what values to expect, but within a fairly generous range, this error did minimise around the correct value for the AB=BC=CD distance and the distance of G from the line.

Although I could easily do this by manually adjusting the trial values, goal seeking obstinately failed to get anywhere near a sensible result.

With 4 data points the accuracy was not good: I measured with about 2% error, the results had about 4% error.
Data points nearer to G generally gave a more pronounced minimum and seemed slightly more accurate.
 

Offline lightarrow

  • Neilep Level Member
  • ******
  • Posts: 4586
  • Thanked: 7 times
    • View Profile
Hello,

    First off, I apologize for posting this math question because I know this is not the right forum for it. But I appreciate it if anybody could help me with it. Or if you know of another forum more appropriate for this question, please let me know.

    Let's first limit ourselves to only the first quadrant of a Cartesian coordinate system. I have a point G that's located on the x-axis, and it's position (x_G, 0) is known (x_G>0). Now, I have four points A, B, C, and D in this quadrant, and I can measure the four distances GA, GB, GC, and GD.

    The four distance measurements each contain some errors. But if error-free (as shown in the attached figure below; I accidentally posted it twice), they're supposed to be all on a straight line, i.e., a linear trajectory (i.e., the line can be described as y=ax+b). Moreover, these four distances are periodically sampled, meaning that (in the error-free case) AB=BC=CD. And, just for the sake of the argument, let's say GA>GB>GC>GD. However, I do not know the coordinates of these four points in the Cartesian system (denoted as (x_A, y_A), (x_B, y_B), (x_C, y_C), (x_D, y_D), respectively).

     I want to find the slope of the linear trajectory in the Cartesian system. The way I'm thinking about doing is to cast this as a minimization problem to first find the points' coordinates on the linear trajectory (denote them as (x'_A, y'_A), (x'_B, y'_B), (x'_C, y'_C), (x'_D, y'_D)). That way, the slope can be estimated to be = (y'_A - y'_B) / (x'_A - x'_B).

    However, I'm not familiar with formulating the minimization (what should be the objective function, and what should be the constraint function). Can anyone help me with this? Thanks in advance.


If:
1. you know the four distances d1, d2, d3, d4;
2. you know that d(A,B) = d(B,C) = d(C,D);
3. you know that the four points A, B, C, D are in a straight line

then, writing the generic equation for the straight line:  y = mx + q, you have six equations in the six unknowns m, q, xA, xB, xC, xD (the y coordinates are found through the equation of the straight line) and so you can solve the problem, but it's not easy, the equations are a bit complicated.

Since A is in the straight line, we have yA = mxA + q, so:
d12 = (xA - xG)2 + yA2 = (xA - xG)2 + (mxA + q)2
In the same way:
d22 = (xB - xG)2 + (mxB + q)2
d22 = (xC - xG)2 + (mxC + q)2
d22 = (xD - xG)2 + (mxD + q)2

You can write yourself the others 2 equations stating that:
d(A,B) = d(B,C)
d(B,C) = d(C,D)

and then solve the system (brrr! I'm lucky it's not me who have to do it!).

It's probably not the best way to solve the problem, I suspect, but anyway you have found one.
 

Offline engrByDayPianstByNight

  • Sr. Member
  • ****
  • Posts: 105
    • View Profile
A heartwarming Thanks to you folks who replied to my question!

DonBrown,

    In your first post, I think what you're describing (about rotating around G) is that this problem has an infinite number of solutions, right? I looked at the problem again afterwards and think you're right. And, even if we do have the knowledge of the distance from G to the line AD (let's call it GE, which is the shortest distance and perpendicular to AD), and the values for AB, BC, and CD, we still end up with infinitely many solutions for the slope. In your second post, if we establish that there are indeed infinite solutions, then the fact that you were able to compute these GE, AB, BC, and CD from (GA, GB, GC, and GD) must imply that somehow you have implicitly assumed a trajectory slope already before the program was run, right?

lightarrow,

    setting up the system of equations was one of the first things I tried, and you're right that solving this system with 6 equations and 6 unknowns was painful, and I couldn't carry it through to the very end (after making multiple substitutions of variables, etc., I also tried to cast the system linear-algebraically but that didn't work out well either). That's why I'm taking a different linear programming (minimization) approach and see if it can be solved that way.

RD,
     
    I've also thought about the least squared error approach (when doing minimization), but have come to realize that in order to do this, I must first have the knowledge of point A's coordinates. Once that's fixed, I think it's possible to to the Minimum LSE. However, in my original problem, I know nothing about any of these points' coordinates (except G on the x-axis, but that doesn't help much), only that these four points are supposed to be along a linear trajectory in a global Cartesian system (the origin of which I know), and that I can only measure their distances to a point G on the x-axis.


     My intention is to cast this problem into a minimization one (however we define the cost and objective functions) to first get the coordinates of the points (x'_A, y'_A), (x'_B, y'_B), (x'_C, y'_C), (x'_D, y'_D) that are aligned on the trajectory, and then compute the slope by the formula that slope = (y'_A - y'_B) / (x'_A - x'_B). But it seems to me now, that with the four periodically measured distances being the only information I have, it may generate infinite solutions.

     If I misunderstood anything above, do let me know. Thanks guys.
 

Offline DonBrown

  • Jr. Member
  • **
  • Posts: 18
    • View Profile
DonBrown,
    the distance from G to the line AD (let's call it GE, which is the shortest distance and perpendicular to AD), and the values for AB, BC, and CD,
...if there are indeed infinite solutions, then the fact that you were able to compute these GE, AB, BC, and CD from (GA, GB, GC, and GD) must imply that somehow you have implicitly assumed a trajectory slope already before the program was run...?
No, I just ignore the slope (direction) of the trajectory - the diagram & geometry/trig are exactly the same whatever the direction.
Call AB=BC=CD=d   and call DE=kd (some, not necessarily integer, multiple of d)
then old Pythagoras says GD^2=GE^2 + (kd)^2
                         GC^2=GE^2 + (d+kd)^2
                         GB^2=GE^2 + (2d+kd)^2 
                         GA^2=GE^2 + (3d+kd)^2
Then we can combine pairs
GC^2 - GD^2 = d^2 + 2kd^2 = d^2(1+2k)
GB^2 - GC^2 = 3d^2 + 2kd^2= d^2(3+2k)
GA^2 - GB^2 = 5d^2 + 2kd^2= d^2(5+2k)
Since the LHS is known, I can express d as a function of k
eg. d = sqrt( (GC^2- GD^2)/(1+2k) )
I asumed (because all the GA, GB, GC, GD were decreasing) that k>=0 ( E is on the opposite side of D to A)
then chose an arbitrary range of values for k from 0 to 10 and calculated the value of d from each equation.
When I look at the results, the three equations give different values of d (as you would expect if I used the wrong value for k), but I looked for the set of results which were closest together.
Then I chose a new range of values for k, say 3 to 5 using smaller increments, looked for the closest results here and narrowed my range and increments again, etc.
Since I'm using a spreadsheet, I need only specify a start and an increment.
To help spot the closest set of results, I also compute the average d and the sum of the squares of their differences from the mean. With perfect data I would expect to find (at the correct value for k) that all three equations gave exactly the correct value for d and my error function would reduce to 0.
Because the data were not exact (I drew a diagram and measured the values) I never achieved a zero error, but the error function did exhibit a single minimum.
I did this by manually adjusting the trial value of k, but I would have thought one could use the goal-seeking tool. Unfortunately, when I tried this, I got 'no solution found' and the best attempts it offered were way outside the possible range.  I don't understand why this happens.  When I asked to get a zero error function, ok, that is obviously not going to happen.  But after I found a near minimum error value and asked it to achieve something not quite as good, it still failed dismally.  I plotted graphs of the error results and they appear to be continuous with a single minimum. When I get a chance I'll try it out in MS Excel and see if it can do any better (than OOCalc.)
 

The Naked Scientists Forum


 

SMF 2.0.10 | SMF © 2015, Simple Machines
SMFAds for Free Forums