Starfox initiated the implementation for this support back in 2013. Unfortunately he didn't conclude the implementation as he didn't have more time to continue with it. I've still checked in all the work he did in the library though. You can enable it by compiling the library with the define WIP_16BYTE_ALIGN.
Do a search in the code for this define and you'll see where things have been modified so far for this support.
The TODO in as_memory.h shows what is left to implement to fully enable support for 16byte aligned objects.
// TODO: Add support for 16byte aligned application types (e.g. __m128). The following is a list of things that needs to be implemented:
// ok - The script context must make sure to always allocate the local stack memory buffer on 16byte aligned boundaries (asCContext::ReserveStackSpace)
// ok - The engine must make sure to always allocate the memory for the script objects on 16byte aligned boundaries (asCScriptEngine::CallAlloc)
// ok - The application needs to inform a new flag when registering types that require 16byte alignment, e.g. asOBJ_APP_ALIGN16 (asCScriptEngine::RegisterObjectType)
// ok - The script object type must make sure to align member properties of these types correctly (asCObjectType::AddPropertyToClass)
// ok - Script global properties must allocate memory on 16byte boundaries if holding these types (asCGlobalProperty::AllocateMemory)
// TODO - The script compiler must make sure to allocate the local variables on 16byte boundaries (asCCompiler::AllocateVariable)
// TODO - The script compiler must add pad bytes on the stack for all function calls to guarantee that the stack position is 16byte aligned on entry in the called function (asCCompiler)
// TODO - The bytecode serializer must be capable of adjusting these pad bytes to guarantee platform independent saved bytecode. Remember that the registered type may not be 16byte aligned on all platforms (asCWriter & asCReader)
// TODO - The bytecode serializer must also be prepared to adjust the position of the local variables according to the need fro 16byte alignment (asCWriter & asCReader)
// TODO - The code for the native calling conventions must be adjusted for all platforms that should support 16byte aligned types (as_callfunc...)
// ok - When the context needs to grow the local stack memory it must copy the function arguments so that the stack entry position is 16byte aligned (asCContext::CallScriptFunction)
// TODO - When the context is prepared for a new call, it must set the initial stack position so the stack entry position is 16byte aligned (asCContext::Prepare)
I suggest starting with the two first TODO's in this list, i.e. make sure local variables are allocated on a 16 byte alignment, and that the callstack is padded so that the functions' stack frame are always aligned to 16 bytes.