A cursory google search suggests alignment is architecture and implementation dependent, so I doubt the OS will provide any guarantees.
On an x86-64 architecture, depending on compiler, you'll usually get either 4-byte-aligned or 8-byte-aligned.
So a 64-bit pointer + 16-bit field could be packed [8|2] for 12 bytes or 16 bytes, if in a struct. If it's two separate malloc calls, then I wouldn't make any assumptions. Usually more memory is allocated than needed, but who knows how the OS, CPU, and MMU handle it all. It could be either of the byte-aligned, or it could be something entirely different from legacy OS code.
The rules for malloc state the pointer you get may be aligned to the size of the element you are being allocated if that is smaller than the size of a pointer, or the alignment of a pointer, but that's it. So if you malloc(1) you can get a pointer where the 1s bit is significant. But if you malloc(87), you are probably going to get a pointer with the three low bits set to zero.
POSIX has aligned_alloc() which lets you specify an alignment bigger than 8, but it has to be a power of 2; you can't ever ask for an alignment of, say, 12 bytes.
I really am trying to not waste space; I don't want to use two 64-bit words to store a pointer plus my 16 bits of "useful" information; this means I'm storing 16 bytes for every 10 bytes of useful info, which is 60% overhead - this adds up in the large dataset I'm working with.
So if I can demonstrate the difference between two malloc() in a single program instance is less than 2^48 (technically 2^55, since I can aligned_alloc() on 128 bytes and not have to worry about the low 7 bits of the pointers), then I can use an arbitrary malloc() as a "base address" and then every other pointer I can treat as an offset from that base address - allowing me to use 6 bytes for the pointer and reduce my memory requirements by 60%.
I could
manually create a base address and create my own indirection table, but again because of large data sets I'd rather not do that because of the performance hits associated with too many indirections in likely different areas of memory which will not be cache-local.