Moved rpmalloc to a C++ compilation to make use of C++ 11 atomics even where C11 is not supported. Changed the way TLS is handled on some POSIX platforms; moved all atomics to standard C++. The current impl. can be optimized as it uses seq_cst everywhere, which can be changed to more relaxed models.