So much work to be done (or re-done). The idea of mutating the slope and y-intercept worked well for explaining the algorithm, but terrible in practice. When the slope approaches vertical, the number grows toware infinity, penalizing vertical lines over horizontal ones. A better approach was to save the (x,y) co-ordinates of where the line enters and leaves the boundary rectangle. By "walking" the points around the edges, I could eventually more closer and closer to an optimal boundary line.

Other work to do includes automatically turning
each optimal bondary into Python code. For example, there could be a routine "*def
isinheart( x, y):*" that would return true if the given point is inside an
optimal heart boundary. Of course, while turning a collection of two-dimensional
data points into a syntax tree of one-dimensional lines is interesting, it would
be great to generalize this algorithm to turn any n-dimension set of points into
a collection of (n-1)-dimensional syntax tree leaves.

Neural networks use the derivate of the error function to help calculate adjustments, and that would be great here. Not only does it theoretically prove convergence, it speeds up the process. This algorithm doesn't handle concave data well, and the solution may not scale. I would like to learn more about Kolmogorov complexity to help decide when a new syntax leaf should be added or disregarded. Of course, there could be many saved example of a "heart" dataset from accurate to sloppy, but how to organize and search through multiple versions of the same shape is a confusing thought and a true "three-legged stool".

Finally, it would be nice to find some way to saving the density of the shape. For example, let's say the "important" part of a heart is the rounded top, and and data points scatter sparsely once the two top curves are defined. It would be interesting to be able to identify and save the idea of a "smear" or feature extraction of the salient parts of a shape.

Return to my homepage