Sign in to follow this  

[C++] convert 'const unsigned char*' to 'const char*'

This topic is 3664 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a function that returns a string in the form of const unsigned char* and I want to assign it to an std::string which accepts const char* I found the only cast that could successfully perform that cast was reinterpret_cast<>. I am not happy using reinterpret_cast<>, since its effects are non-portable. Can somone explain why this is the only cast that works when it seems I should be able to use either static_cast<> or const_cast<>. Thanks.

Share this post


Link to post
Share on other sites
They are different types, and as you have already seen, you can't convert between them without an explicit cast.

Just a thought: are you sure that you're unable to use std::basic_string <unsigned char> to solve your problem?

In any case, IIRC, std::basic_string has a constructor and an assign function that take an iterator range, so you might be able to use that.

Share this post


Link to post
Share on other sites
1) const_cast
This cast would allow you to cast (const unsigned char*) to (unsigned char*). Like it's name implies, it can *only* modify constness. This is most definately not what you want here, or nearly anywhere else. It's main use is for working around broken code.

2) static_cast
This cast can do by-value conversion between (char), (unsigned char), and other POD types such as (float). However, when converting pointers, static_cast can only do conversion between related types (e.g. from (base*) to (derived_from_base*)), which as far as the compiler is concerned, (char) and (unsigned char) are not (although they share many common properties).

Frankly, a function that returns a string as a (const unsigned char*) is horribly broken in the head, so much so that dealing with things the proper way might not even be worthwhile. That said, here's a workaround (which it looks like raz0r made mention of):

const unsigned char * begin = broken_function();
const unsigned char * end = begin;
while (*end != '\0') ++end; //find the end of the string

std::string mystring(begin,end); // Use C++'s iterator constructor, should be fine to convert from (unsigned char) to (char) itself.

myexistingstring.assign(begin,end); // Use C++'s iterator assign function if the string already exists

Share this post


Link to post
Share on other sites
Quote:
Original post by MaulingMonkey
Frankly, a function that returns a string as a (const unsigned char*) is horribly broken in the head, so much so that dealing with things the proper way might not even be worthwhile.


I'm curious why you'd go so far as to call the function 'horribly broken'. To me it seemed unconventional/annoying to return that type ... is there some greater reason that I am missing?

Either way I can't change it since its part of a separate API.

Share this post


Link to post
Share on other sites
Quote:
Original post by fpsgamer
Quote:
Original post by MaulingMonkey
Frankly, a function that returns a string as a (const unsigned char*) is horribly broken in the head, so much so that dealing with things the proper way might not even be worthwhile.


I'm curious why you'd go so far as to call the function 'horribly broken'. To me it seemed unconventional/annoying to return that type ... is there some greater reason that I am missing?


For "narrow" (char-sized character) strings:
String literals are (const char[]).
The C Standard Library deals exclusively with a char* based string.
The C++ Standard Library also deals exclusively with (char) for all it's default string and stream typedefs.
99.9999% of 3rd party libraries that need to deal with strings -- parsing libraries, GUI libraries, hell, anything that just needs to open a file, use char based strings.

The C++ standard even goes so far as to be neutral on the issue of whether (char) is signed or not, on the basis that it should be whatever's most efficient for that platform for storing text. The one reason to break convention is to handle wide character sets with wchar_t or other larger types.

Unconventional implies there might be some sort of sane rationale behind the decision. There isn't anything of the sort for exposing (unsigned char*) "strings" -- it's just plain stupid. Conventionless might be a more appropriate term.

This is why I call it horribly broken in the head. It may "work" in it's implementation, but it's a completely counterintuitive, counterproductive, countersanity thing to do in terms of it's interface/design.

Quote:
Either way I can't change it since its part of a separate API.

Indeed :-/. I'd suggest wrapping the function in question so you only have to work around it's stupidity in one place.

Share this post


Link to post
Share on other sites
Quote:
Original post by fpsgamer
Either way I can't change it since its part of a separate API.

Just had to ask: Is it the Microsoft RPC API?

Anyway, use reinterpret_cast, it is safe according to the standard in 3.9.1.1: "Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements".

Share this post


Link to post
Share on other sites
All that just to convert a uchar* from some broken library to a char* ? Speaking of complete over-engineering...

A little pragmatism is in order, if you ask me. Just use a C-style cast. Takes two seconds to type, and works perfectly fine in this case.

Share this post


Link to post
Share on other sites
Quote:
Original post by dalleboy
Just had to ask: Is it the Microsoft RPC API?


Nope. SQLite.

Quote:
Original post by dalleboy
Anyway, use reinterpret_cast, it is safe according to the standard in 3.9.1.1: "Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements".


When you say it is 'safe', does that mean we can ignore all previous warnings about non-portability. Or do we ignore that warning only in this case.


Quote:
Original post by Yann L
All that just to convert a uchar* from some broken library to a char* ? Speaking of complete over-engineering...

A little pragmatism is in order, if you ask me. Just use a C-style cast. Takes two seconds to type, and works perfectly fine in this case.


But if reinterpret_cast<> has non-portable effects, wouldn't the equivalent C-style cast also have non-portable effects?

Share this post


Link to post
Share on other sites
Quote:
Original post by fpsgamer
Quote:
Original post by dalleboy
Just had to ask: Is it the Microsoft RPC API?


Nope. SQLite.


Huh. It's not even interally self consistent. Or rather, it is, in the most backward-ass way you could possibly imagine:
void sqlite3_result_text(sqlite3_context*, const char*, int, void(*)(void*));
const unsigned char *sqlite3_value_text(sqlite3_value*);
It seems their motto is "const char* in, unsigned const char* out" in more places than this.

Quote:
When you say it is 'safe', does that mean we can ignore all previous warnings about non-portability. Or do we ignore that warning only in this case.

I wouldn't call it safe, but neither would I call using that library safe -- $5 says that the platforms that such a cast would blow up on, that this library would blow up far far worse on. They seem to like the idea of silent failures too: Out of the 16 of their FAQ entries, 12.5% of them -- #3 and #9 -- both say to hell with invariants, do it anyways, and let some other poor schmuck deal with the problem! (0% of them cover any sort of rationale as to their treatment of "text" as multiple, varying types. If they had any, it most definitely would've belonged there.)

Quote:
Quote:
Original post by Yann L
All that just to convert a uchar* from some broken library to a char* ? Speaking of complete over-engineering...

A little pragmatism is in order, if you ask me. Just use a C-style cast. Takes two seconds to type, and works perfectly fine in this case.


But if reinterpret_cast<> has non-portable effects, wouldn't the equivalent C-style cast also have non-portable effects?


Yes, they'd be equivalent here (although I'd be more the kind to keep the screaming warning sign that is reinterpret_cast). However, that library probably does the exact same thing internally, so you're ****ed anyways, which is what I was sort of getting at with the "dealing with things the proper way might not even be worthwhile" bit. Either way, I'd wrap it's handling in one place. Bury the issue from the rest of your code as much as you can. Like the radioactive waste of an interface that it is.

Share this post


Link to post
Share on other sites
Quote:
Original post by fpsgamer
When you say it is 'safe', does that mean we can ignore all previous warnings about non-portability. Or do we ignore that warning only in this case.

Using reinterpret_cast<T>(x) where T is const char*, const signed char*, or const unsigned char*, and the type of x is const char*, const signed char*, or const unsigned char* is portable and shall have no non-portable effects in any compliant conforming C++ compiler.

Share this post


Link to post
Share on other sites
Quote:
Original post by dalleboy
Quote:
Original post by fpsgamer
When you say it is 'safe', does that mean we can ignore all previous warnings about non-portability. Or do we ignore that warning only in this case.

Using reinterpret_cast<T>(x) where T is const char*, const signed char*, or const unsigned char*, and the type of x is const char*, const signed char*, or const unsigned char* is portable and shall have no non-portable effects in any compliant conforming C++ compiler.


I'm curious how you know that.

Is that explicitly stated in the standard as a special case? Or is there some deeper reasoning that led you to that.

Just wondering.

Share this post


Link to post
Share on other sites
Quote:
Original post by fpsgamer
I'm curious how you know that.

Is that explicitly stated in the standard as a special case? Or is there some deeper reasoning that led you to that.

As i wrote earlier, the C++ standard states the following in 3.9.1.1:

Quote:
Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements

As the three types occupy the same amount of storage and have the same alignment requirements, and that sizeof(char) is guaranteed to be equal to 1, there isn't much room left for anything else.

Share this post


Link to post
Share on other sites

This topic is 3664 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this