This is definitely true: one could imagine a model with a mix of the two layers ...

This is definitely true: one could imagine a model with a mix of the two layers or a simple linear / MLP-like kernel doing "preprocessing" before KAN layers. Other work that explores task performances for KANs and MLPs generally finds KANs are worse at non-symbolic tasks, but it would be interesting to see if hybrid architectures could improve on this failure mode.