[C++] convert 'const unsigned char*' to 'const char*'

Started by
11 comments, last by dalleboy 16 years, 4 months ago
I have a function that returns a string in the form of const unsigned char* and I want to assign it to an std::string which accepts const char* I found the only cast that could successfully perform that cast was reinterpret_cast<>. I am not happy using reinterpret_cast<>, since its effects are non-portable. Can somone explain why this is the only cast that works when it seems I should be able to use either static_cast<> or const_cast<>. Thanks.
Advertisement
I think
reinterpret_cast
use for cast pointer types.
And others are for normal types. I'm not so sure.



DinGY
Yesterday is history.Tomorrow is a mystery. Today is a gift"
They are different types, and as you have already seen, you can't convert between them without an explicit cast.

Just a thought: are you sure that you're unable to use std::basic_string <unsigned char> to solve your problem?

In any case, IIRC, std::basic_string has a constructor and an assign function that take an iterator range, so you might be able to use that.
1) const_cast
This cast would allow you to cast (const unsigned char*) to (unsigned char*). Like it's name implies, it can *only* modify constness. This is most definately not what you want here, or nearly anywhere else. It's main use is for working around broken code.

2) static_cast
This cast can do by-value conversion between (char), (unsigned char), and other POD types such as (float). However, when converting pointers, static_cast can only do conversion between related types (e.g. from (base*) to (derived_from_base*)), which as far as the compiler is concerned, (char) and (unsigned char) are not (although they share many common properties).

Frankly, a function that returns a string as a (const unsigned char*) is horribly broken in the head, so much so that dealing with things the proper way might not even be worthwhile. That said, here's a workaround (which it looks like raz0r made mention of):

const unsigned char * begin = broken_function();const unsigned char * end = begin;while (*end != '\0') ++end; //find the end of the stringstd::string mystring(begin,end); // Use C++'s iterator constructor, should be fine to convert from (unsigned char) to (char) itself.myexistingstring.assign(begin,end); // Use C++'s iterator assign function if the string already exists

Quote:Original post by MaulingMonkey
Frankly, a function that returns a string as a (const unsigned char*) is horribly broken in the head, so much so that dealing with things the proper way might not even be worthwhile.


I'm curious why you'd go so far as to call the function 'horribly broken'. To me it seemed unconventional/annoying to return that type ... is there some greater reason that I am missing?

Either way I can't change it since its part of a separate API.

Quote:Original post by fpsgamer
Quote:Original post by MaulingMonkey
Frankly, a function that returns a string as a (const unsigned char*) is horribly broken in the head, so much so that dealing with things the proper way might not even be worthwhile.


I'm curious why you'd go so far as to call the function 'horribly broken'. To me it seemed unconventional/annoying to return that type ... is there some greater reason that I am missing?


For "narrow" (char-sized character) strings:
String literals are (const char[]).
The C Standard Library deals exclusively with a char* based string.
The C++ Standard Library also deals exclusively with (char) for all it's default string and stream typedefs.
99.9999% of 3rd party libraries that need to deal with strings -- parsing libraries, GUI libraries, hell, anything that just needs to open a file, use char based strings.

The C++ standard even goes so far as to be neutral on the issue of whether (char) is signed or not, on the basis that it should be whatever's most efficient for that platform for storing text. The one reason to break convention is to handle wide character sets with wchar_t or other larger types.

Unconventional implies there might be some sort of sane rationale behind the decision. There isn't anything of the sort for exposing (unsigned char*) "strings" -- it's just plain stupid. Conventionless might be a more appropriate term.

This is why I call it horribly broken in the head. It may "work" in it's implementation, but it's a completely counterintuitive, counterproductive, countersanity thing to do in terms of it's interface/design.

Quote:Either way I can't change it since its part of a separate API.

Indeed :-/. I'd suggest wrapping the function in question so you only have to work around it's stupidity in one place.
Quote:Original post by fpsgamer
Either way I can't change it since its part of a separate API.

Just had to ask: Is it the Microsoft RPC API?

Anyway, use reinterpret_cast, it is safe according to the standard in 3.9.1.1: "Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements".
Arguing on the internet is like running in the Special Olympics: Even if you win, you're still retarded.[How To Ask Questions|STL Programmer's Guide|Bjarne FAQ|C++ FAQ Lite|C++ Reference|MSDN]
All that just to convert a uchar* from some broken library to a char* ? Speaking of complete over-engineering...

A little pragmatism is in order, if you ask me. Just use a C-style cast. Takes two seconds to type, and works perfectly fine in this case.
Quote:Original post by dalleboy
Just had to ask: Is it the Microsoft RPC API?


Nope. SQLite.

Quote:Original post by dalleboy
Anyway, use reinterpret_cast, it is safe according to the standard in 3.9.1.1: "Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements".


When you say it is 'safe', does that mean we can ignore all previous warnings about non-portability. Or do we ignore that warning only in this case.


Quote:Original post by Yann L
All that just to convert a uchar* from some broken library to a char* ? Speaking of complete over-engineering...

A little pragmatism is in order, if you ask me. Just use a C-style cast. Takes two seconds to type, and works perfectly fine in this case.


But if reinterpret_cast<> has non-portable effects, wouldn't the equivalent C-style cast also have non-portable effects?
Quote:Original post by fpsgamer
Quote:Original post by dalleboy
Just had to ask: Is it the Microsoft RPC API?


Nope. SQLite.


Huh. It's not even interally self consistent. Or rather, it is, in the most backward-ass way you could possibly imagine:
void sqlite3_result_text(sqlite3_context*, const char*, int, void(*)(void*));
const unsigned char *sqlite3_value_text(sqlite3_value*);
It seems their motto is "const char* in, unsigned const char* out" in more places than this.

Quote:When you say it is 'safe', does that mean we can ignore all previous warnings about non-portability. Or do we ignore that warning only in this case.

I wouldn't call it safe, but neither would I call using that library safe -- $5 says that the platforms that such a cast would blow up on, that this library would blow up far far worse on. They seem to like the idea of silent failures too: Out of the 16 of their FAQ entries, 12.5% of them -- #3 and #9 -- both say to hell with invariants, do it anyways, and let some other poor schmuck deal with the problem! (0% of them cover any sort of rationale as to their treatment of "text" as multiple, varying types. If they had any, it most definitely would've belonged there.)

Quote:
Quote:Original post by Yann L
All that just to convert a uchar* from some broken library to a char* ? Speaking of complete over-engineering...

A little pragmatism is in order, if you ask me. Just use a C-style cast. Takes two seconds to type, and works perfectly fine in this case.


But if reinterpret_cast<> has non-portable effects, wouldn't the equivalent C-style cast also have non-portable effects?


Yes, they'd be equivalent here (although I'd be more the kind to keep the screaming warning sign that is reinterpret_cast). However, that library probably does the exact same thing internally, so you're ****ed anyways, which is what I was sort of getting at with the "dealing with things the proper way might not even be worthwhile" bit. Either way, I'd wrap it's handling in one place. Bury the issue from the rest of your code as much as you can. Like the radioactive waste of an interface that it is.

This topic is closed to new replies.

Advertisement