Approximating Hyperbolic Tangent
jtomschroeder.com43 points by jtomschroeder 8 hours ago
43 points by jtomschroeder 8 hours ago
A different approach, refining the square root based sigmoid with a polynomial, is in my blog post "a few of my favorite sigmoids" [1]. I'm not sure which is faster without benchmarking, but I'm pretty sure its worst case error is better than any of the fast approximations.
[1]: https://raphlinus.github.io/audio/2018/09/05/sigmoid.html
Why would you want to approximate tanh for the use in neural networks? Every smoothed step function will do, so if your concern is speed, why not design something for speed, who cares if it is an established mathematical function? Because you might also need the derivative and tanh(x) has a quite nice one with 1 - tanh²(x) that is cheap to compute if you already have tanh(x)?
A different floating point hack makes exp() easier to compute in hardware (and consequently tanh). You cast the input to an int and take the first 2 bits of what would be the mantissa. LUT[Index] and LUT[Index+1] from your 5-entry table are used to either lerp or poly approx. the function, with the remaining mantissa bits to help.
There’s an analysis of the Schraudolph approximation of the exponential function (along with an improvement upon it) that someone might find interesting at https://typ.dev/attention#affine-cast
Looks interesting. Should start with a definition of the Hyperbolic Tangent. It is only about 2/3 of the way that the definition occurs in a discussion of computing exp(x).