Sorry, I haven't had time to read your papers in full yet. Have you considered that LUTs on many FPGAs aren't 2:1 but instead, say, 6:3 and also may contain flip-flops and muxes? FPGA synthesis may not be as easy as "just" translating the activation functions to LUTs.
This is a simplification in the blog post: each activation doesn't map one-to-one onto a physical FPGA LUT primitive, but is instead represented as a "logical LUT" (L-LUT) that Vivado synthesizes into distributed RAM. These L-LUTs map very naturally to FPGA fabric (as compared to multiply-accumulate operations), making them a very useful implementation-level abstraction.