An AES hardware accelerator targeting energy efficient, low cost mobile and IoT applications is fabricated in 40nm CMOS. The proposed design eliminates the ShiftRow stage in conventional AES implementations and replaces flip-flops in data and key storage with latches using re-timing, saving 25% area and 69% power. Along with a 2-stage Sbox in native GF(24)2 composite-field computation and glitch reduction techniques, this results in a compact 2228 gate design achieving 446 Gbps/W and 46.2 Mbps throughput at 0.47V.