This article makes the mistake of assuming the naive error bound is the actual error on the specific problem. This problem has very smooth functions, infinitely differentiable, and has derivatives tabulated. The error bound he states doesn't use this information.
The correct answer is to check this problem specifically, not apply much looser bounds.
New readers of John Cook may not know this: his posts are pedagogical, often are illustrating the gist of the idea or message, not including all the details or the optimal analysis. This post mentioned the motivation too.
Also, he is a consultant, and this blog generates interests of that. He often works in clients’ data and perform EDA and provides solutions, such as statistical reasoning or methods. Ie while he write codes, it’s often not for production. He points people in the right direction so to speak.
I was at a startup from 93-99 until it got sold. One of the things the chip did was music synthesis from wave tables. The traditional way of doing this is to build a polyphase filter to interpolate between the sample values. Instead, we did the work up-front of converting the wave tables to polynomials. During playback, there was then no need for the polyphase filter -- just evaluate the polynomial piece at whatever fractional phase you care to.
As a result, it could synthesize 100+ voices and still have DSP left over.
EDIT: I forgot to say, the guy who did that work was Avery Wang, who later went on to invent the algorithm used by Shazam to identify songs.
On the topic of interpolation, I've recently been playing with approximating 2D curves (given as some implicit or parametric formula) by chains of Bézier segments of a given degree, while maintaining a maximum distance between the Bézier segments and the exact curve.
Oddly, while there are preexisting resources for interpolating a Bézier through a set of points, there seems to be nothing much for continuous curves. So my current approach is a brute-force binary search, where, from a set start point, I select an end point on the input curve, construct a Bézier through it so that the slopes, curvatures, etc. match up at those points (as much as possible according to the degree), calculate the maximum distance, and compare it to the error bound.
This works quite well when it does work: e.g., with cubic Béziers and tame input curves, the maximum error can be over 100k times smaller than the arc length for a given segment. But the maximum-distance calculation can get quite hairy, since basic ray-tracing doesn't always find all the common normals, and this manifests as numerical instability in the binary search. I often wish I knew the proper ways to do these things, apart from giving up and resorting to the traditional "just discretize the input curve and use B-splines" strategy.
Love this. Very interesting that same amount of compression (samples) can give ever more accuracy if you do a bit more work in the decompression — by taking higher order fits to more of the sample points.
This is pretty much the core principle underlying modern machine learning. More parameters means more faithful fit for the data, at the cost of over-fitting and generalizing poorly on unseen data from outside the range of data that was used to tune the parameters. In this particular application, we aren't that worried about overfitting because we know the actual function used to compress the data in the first place, so we know that our decompression function is "correct" and we know the range of the data. So we can keep adding parameters to reduce reconstruction error. Meanwhile in applied ML and stats, cubic and even quadratic models should be used and interpreted only with extreme caution and detailed knowledge of the data (how it was prepared, what the variables mean, what future data might look like, etc).
This also seems to a difference between interpolation and extrapolation. The table doesn't just fit a polynomial to theta between 0 and pi/8 and expect you to extrapolate for theta > pi/8. That would have catastrophic results. It has always seemed to me like one of the big problems with ML is knowing whether a given inference is an interpolation or an extrapolation.
>Fifty years ago scientists were concerned with a different application of compression: reducing the size of mathematical tables.
Guilty as charged.
>Books of tabulated functions are obsolete now, but the principles used in producing these tables are still very much relevant.
Even more relevant back then in reverse, after computers came within reach.
But you have to get there first.
Going back more than 50 years, to arrive at the tables to begin with, an algorithm was calculated across a series of values, and the output tabulated according to increments that may or may not be the same as that of the original data collection. This could be especially true with physical properties or natural phenomena. OTOH with pure math functions which are well-defined to begin with, somebody has to pick the range and increments. Either way this needs to be done so that the table serves its intended purpose.
Frequently one of the important features was the accuracy beyond what could be obtained from an engineering slide rule. In cases where a slide rule delivered adequate significant figures for the job, people were not motivated to generate a half-kilo book of numbers. But when more decimal places were needed it was tables, 4 to 6 decimal places or significant figures made it worth it lots of times.
Although sometimes a "table or two" had been calculated using a slide rule in a very definitive way, very close to the highly valid raw data. The most trusted way it could be at the time. As the leading author of the table, numerically characterizing the property for publication, the best guess estimate of significant figures would be made beyond that which the slide rule was reliably capable. Simply because more decimal places were needed when that's the purpose of the table to begin with.
Publish them and they become the best reference there is, regardless of their certainty.
Further if tables find their way into an institutional workflow, they can be made official, and therefore required. To the full number of significant figures supported by the recognized weakest link in the calculation chain, often the table(s) itself. Which brings any deviations in precision to the forefront every time for scrutiny. Even though uncertainty can be well within the accuracy of the original slide rule, because it was too sloppy physically for the original task. Too bad, the table's been official forever, you can't use a computer to replace the table unless it outputs the exact same values as the table, to the full number of decimal places. The accuracy of a table can even be recognized as questionable across-the-board, that's OK if everybody agrees to use it anyway, so there.
One of the things that helped to make progress was doing everything in integers that would have otherwise been done in floating point. Something like it was done with a slide rule where you had to keep good track of the decimal point your own self.
It is all too common for floating point numbers to give the illusion that there are unlimited decimal places in everything.
Getting back to the article, when building a table from an appropriate function using an adequate calculating device, ideally the repeatable number of significant figures to be published with complete certainty, combined with the size of the increment between entries, will allow interpolation between published values to be almost as trustworthy as the published values themselves.
Even if you are not "publishing" the values, the number of decimal places accurately retained during critical calculations will be the same limitation, and the increment comes into play whether or not it is the same granularity going into a "database" or withdrawing from it.
The correct answer is to check this problem specifically, not apply much looser bounds.