Windows’ Unicode Support
All internal functions in Windows use the Unicode format UTF16 natively. The book Windows Internals mentions that when a developer calls an the ANSI version of a function that Windows converts all the function’s parameters that are ANSI strings to Unicode and calls the Unicode version of the function. Then when the Unicode function is finished, Windows will then convert the Unicode result strings back to ANSI and return. ReactOS which is modeled off Windows allocates a new string buffer for each conversion that needs to take place.
String Library
Such string conversion layers work well for a higher level API, but for core API or for a string library it would not do. It would be insane to have to do a ANSI to Unicode conversion for each string function needing to be called. You would have to allocate a new string buffer each time you wanted to call a string function.
When ends up happening with string libraries is that a lot of duplicate code ends up being created. For each string function multiple encoding format functions have to be written. And this is done so that 1). you don’t encur the overhead cost of conversion and 2). for side-by-side encoding support. Having side-by-side encoding support is important because you’ll find you end up needing to use encoding formats other than the default encoding format that you are building your project in.
Side-by-side support for ANSI/ASCII and Unicode/UTF16
#ifdef UNICODE
#define String_Copy StringUtf16_Copy
#else
#define String_Copy StringAscii_Copy
#endif
int32 StringUtf16_Copy(wchar *Target, wchar *Source);
int32 StringAscii_Copy(char *Target, char *Source);
In the code example above, you can call String_Copy to copy a string based on the build encoding for your project or if you need to explictly call the UTF16 or ASCII versions of String_Copy throughout your code you can do that as well.
Windows API macros are set up the exact same way, the only difference is that the ANSI functions call Unicode functions, something which is not reasonable to do with a string library due to the conversion overhead.
TCHAR
Most functions won’t need side-by-side support in that you’ll need both an ANSI version and a Unicode version. For these functions we can use TCHAR which is defined as the char type for the current project’s build encoding.
#ifdef UNICODE
#define tchar wchar
#else
#define tchar char
#endif
int32 File_Open(int32 FileHandle, tchar *Filename, int32 Flags);
Naming
Naming conventions are always important. The cleaner you name your functions and variables the better. Windows names the ANSI version of a function by appending an ‘A’ and a Unicode version of a function by appending a ‘W’ for wide char. For instance in Windows their is a LoadImageA and LoadImageW and the LoadImage macro selects which function to call based on the character encoding for your project. If I were to do the same thing I would have ended up with String_CopyA and String_CopyW. Or I could do it the exact opposite and name it how the C runtime library does and prepend a ‘w’ only on Unicode functions in which case I would end up with String_Copy and WString_Copy. Too me neither really look right. So because the character encoding type is not recongizable in the letters ‘A’ and ‘W’ I decided to use the name of the format appended to the end of the class name. What would I do if I had to add another character encoding to the string library? Add an ‘X’? Functions that use multiple character encodings will have variable names in the format Filename for ASCII, Filename8 for UTF8, and Filename16 for UTF16.
Happy Coding. The End.