std::string versus char[255] ?

Started by
22 comments, last by cozzie 7 years, 3 months ago

Hi all,

I've noticed/ had a bug when I used a std::string to store a variable in a struct, which was gone when I used a char array [255].

Below the 2 pieces of code, 1st being ok/preditable, 2nd not working.


struct DX11_VS_INPUT_ELEMENTDESC
{
	
	DX11_VS_INPUT_ELEMENTDESC(D3D11_INPUT_ELEMENT_DESC pInputDesc)
	{
		memset(SemanticName, 0, 255); 
		strcpy_s(SemanticName, pInputDesc.SemanticName);

		ElementDesc = pInputDesc;
		ElementDesc.SemanticName = nullptr;
	}

	D3D11_INPUT_ELEMENT_DESC	ElementDesc;
	char		 				SemanticName[255];
};

With the std::string:


struct DX11_VS_INPUT_ELEMENTDESC
{
	
	DX11_VS_INPUT_ELEMENTDESC(D3D11_INPUT_ELEMENT_DESC pInputDesc)
	{
		SemanticName = pInputDesc.SemanticName;

		ElementDesc = pInputDesc;
		ElementDesc.SemanticName = nullptr;
	}

	D3D11_INPUT_ELEMENT_DESC	ElementDesc;
	std::string					SemanticName;
};

At some point I create a checksum based on a std::vector of those struct's objects.

Results are unpredictable.

I think it has something to do with the fact that pInputDesc.SemanticName is a LPCSTR (no string by value).

Do you know how this can be explained and/or how this will work OK with a std::string?

Any input is appreciated.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Advertisement

At some point I create a checksum based on a std::vector of those struct's objects.
Results are unpredictable.


Post the checksumming code?

std::string is essentially just a pointer and a length, or two pointers. If your checksum code doesn't take that into account, you'll get different results every time as the pointers will be different every time.

The std::string is a container, basically a pointer to a buffer, and a few integers storing the size of the buffer and the amount of buffer in use, potentially a few other details. There are some implementations of std::string that store the data directly if it is smaller than a certain size, but generally it is stored somewhere else. This is true for all the containers: a std::vector doesn't contain the values directly, instead it points to a data buffer that contains the values.

In the code where you store the character array directly, the element is a data buffer. If you are taking the checksum of that then the contents of the string are included in your checksum because the buffer is part of the object. (This has a direct consequence that the buffer is a fixed length, always taking up 255 bytes even if you store less, and it cannot grow if you need to store more.)

If you are taking the checksum of the std::string you are not getting a checksum of the data buffer, you are getting a checksum of the pointer and sizes. Unless you want those to be in the checksum, you probably should compute it for the values in the string's data instead of the block that contains the string.

/edit: ninja by Oberon.

Thanks guys.
I dont want to store the pointer in the struct (for which the checksum is calculated), so I want to store the characters/ array in the struct. Would that simply mean char[] is the only option, or can I use a string which is constructed using the chars that fhe LPCSTR points too? Maybe like this:

LPCSTR bla = "sometext";
std::string myString = std::string(bla);

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

I dont want to store the pointer in the struct (for which the checksum is calculated), so I want to store the characters/ array in the struct. Would that simply mean char[] is the only option, or can I use a string which is constructed using the chars that fhe LPCSTR points too? Maybe like this:

The std::string stores its own copy of the char-data. The point is, its storage is dynamic, and not static:


// pseudo-implementation
class string
{
public:

    string(const char* src)
    {
        size = strlen(src);
        data = new char[size];
        strcpy(src, data, size);
    }
    
public:
    char* data; // 4 byte pointer to char-array VS your char[256] = 256 byte char array
    size_t size;
}

So instead of hashing the std::string-object, you can just call c_str() on it, and hash that like you would hash the static char-array (size() gives you the size, or you can treat that string as null-terminated).

Thanks. I understand now, basically .c_str() returns the array of characters (instead of the pointer to it).

I'll have to figure out if/ how I'll do that, because the variable is part of a struct, and the struct is hashed as a whole (void*). In that case it might be a good idea too just store the char array instead of a std::string.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Why do you need to hash things via a void*?

Seems to me like you could have a generic hash function, which can be overloaded or specialised for DX11_VS_INPUT_ELEMENTDESC.

Sorry, I'm actually hashing the DX11_VS_INPUT_ELEMENTDESC struct object.

Which contains the D3D11 inputElementDesc + a char array (or string :)) for the semanticName. In the D3D11 elementDesc I assign nullptr to the semanticname (LPCSTR) because of the exact reason above.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

My point remains though - I don't see why you can't have a specific function that can take the existence of a std::string into account.

There's an interesting new design technique in which data and operations on that data are grouped together to minimize coupling between concepts. This makes reasoning about interactions between components much much simpler and enabled a finer-grained verification strategy, and in turns reduces the overall cost of development and especially maintenance by often an order of magnitude.

In you case, instead of designing a system so every component knew about the internals of every other component at the lowest possible level, you would design it so a component would expose only a small interface and hid its internals. That would mean your DX11_VS_INPUT_ELEMENTDESC struct would have an associated function that returns its own hashsum, and no generic function would have to have and in-depth low-level knowledge of how data is stored in the aggregate.

A modern language like C++ actually provides built-in support for this technique. By using the class synonym for the struct keyword you're indicating to readers your intention to encapsulate functionality and data, and by making the hashsum calculation a member function, you can indeed encapsulate such low-level details as memory allocation strategies and not have them leak out and affect the internal design of completely unrelated modules in your software.

This technique was the result of decades of practical development in the field and academic research, and it had a great impact on productivity in industry in general. Of course, if you're fixated on avoiding such techniques, there are alternatives (such as separate hashsum functions for different structs) that still let you minimize development and maintenance costs, or alternatively you might be keep to maximize your costs going forward, in which case feel free to make things increase in complexity combinatorially as you continue development.

But I would suggest that trying to had-craft data structures at a low level, and having to rely on the knowledge of the internals of standard library class implementations in general, is going to bite you in the bum swiftly, frequently, and hard.

Stephen M. Webb
Professional Free Software Developer

This topic is closed to new replies.

Advertisement