libm: sqrt and sqrtf via ARM vsqrt instruction

Optimized sqrt and sqrtf for arm by using hardware
opcode for sqrt rather than generic slow portable
code.

Change-Id: I84694159577aef6418710548085d8149c45e0e3f
(cherry picked from commit 434d98cd36cdd2514a7118e69624e5d205ca849a)
(cherry picked from commit 5fe41e6f146bcadd4904da26351c646cdc90d196)
(cherry picked from commit e314f75340c8e818b17373314ceb54039fcd76ad)
3 files changed