Generate Unique Ids
As for tracking sub-IDs, why not just keep a std::map<unsigned, std::set<unsigned>> that holds the sub-IDs allocated to each "parent" ID? Then you can traverse that set and nuke/free the IDs as needed.
The pigeon hole principle applies for this type of thing.
Generally people either go with a very large value and take steps to ensure it is probably going to be unique, such as a GUID, or they use an external system that tracks and ensures each ID is unique.
For text strings there are possibly other ways to handle it.
Notes on each:
A 'probably unique' value can mean many things. A GUID can be 'probably unique' at generation and several GUID generators encode information like the current time in milliseconds, a counter unique to that machine's generator, and a hardware ID value, which will reduce the likelyhood of the numbers ever colliding. Other times it can mean a hash function like running SHA over a string, which will probably not generate a collision but with enough values it eventually will.
I've worked on several games with an external registration system for object IDs. When a new type of game object is created there is a script or tool that registers the use with a database and increases the count, or a file somewhere that has the last used number, or similar. If it needs to be unique but only for the duration of the program, a direct data dictionary (such as a map) works well enough.
Text strings for display are a slightly different beast and they are often handled differently.
For text strings it is often necessary to localize the messages. The best way I've seen to work with that is a direct data mapping and a localization database. The key is literally the string. The programmer might end up using a key "trapped_door_open_message" and it is the programmer's responsibility to ensure the key is entered in the localization database and flagged for translation. When a message is not translated but goes through the localization system anyway it results in a value like: "Untranslated string key: trapped_door_open_message".
Better localization systems allow for providing both the count and gender of nouns for the subject and object of the statement, and also allow for numbered replacement so translators can swap those out, rather than c-style replacement in a first-come first-used basis. The translators can then do many different options: "You have {1} of {2}, only {3} remain." or "Only {3} left, you have {1} of {2}." or even "{1} of {2} complete.", or "{3} items remaining.", etc.
Another better thing for strings is to have a special type for localized strings. You can feed std::string objects into the localization system, and what comes out is a LocalizedString object. Functions like concatenation do not work on LocalizedString objects because they've already been processed for display, they become unchangeable values as far as the programmer is concerned.
Just allocate IDs one at a time starting at 0 (or 1). You have 4 billion+ IDs to play with in a 32-bit unsigned integer.
As for tracking sub-IDs, why not just keep a std::map<unsigned, std::set<unsigned>> that holds the sub-IDs allocated to each "parent" ID? Then you can traverse that set and nuke/free the IDs as needed.
Hi, I also thought of this solution. But it has a flaw.
For example
My first bigText contains 15 sub-strings, then ids would be
11, 12,13,14,15,16.........111,12,113,114,115
ids 111,112,113,114 ..... will collapse with 11th bigText's substrings
11th one will create substrings and its Ids would be
111,112,113.....
You added a new requirement, that duplicate strings get merged.
There are systems where all the strings get placed into a common pool and duplicate strings are combined and all are treated as read-only. That process is called "string interning", and there is plenty of stuff you can read online about it.