Type of Allocator to use for cstrings?

Started by
7 comments, last by Oberon_Command 7 years ago

I'm working on a documentation parser that parses xml, for code generation. Previously I was just using cstrings, they're extremely fast, but lead to weird memory errors on occasion. To address this I made my own string struct, and overloaded various operators. I took a slight performance hit since instead of allocating the cstrings on the stack I'm now allocating them on the heap, because of malloc. some of the cstrings hold entire files on occasion, so I want to handle those as well. Ideally, I'll eventually have two allocators and 2 different string classes for handling files and processing lines for writing to files. (I need to insert into the middle of files on occasion)

My question is which type of allocator should I use for handling this use case? I'm guessing a stack allocator that can grow with the size tuned for starting at 16,384, and bumping the size of the stack up to 32,768 and so on if overflow would occur. I probably also need a marker, so I can roll back after writing to files. Is anyone able to help with resources, on how to approach designing allocators? right, now I feel like I could make something that would work on current inputs, but might cause problems if a larger xml file was introduced

Advertisement

What are you doing with strings exactly?

I'd use simple std::string for pieces of text, say a single line.

XML processing is normally done with an XML library, readling and writing files is normally done by streaming it directly from or to disk.

Any particular reason this won't work for you?

Why not memory map the files and then directly read the characters as a C string? That will give the fastest read performance and avoid having to do many string allocations.

Why not memory map the files and then directly read the characters as a C string? That will give the fastest read performance and avoid having to do many string allocations.

Yeah, that actually would work the best, and I could see replacing any uses of fstream with that.

XML processing is normally done with an XML library, readling and writing files is normally done by streaming it directly from or to disk.

I'm using TinyXML2 for processing the actual XML file. I'm just building cstrings from that data for producing actual valid C++ code from the documentation.

What are you doing with strings exactly?

I'm mainly doing this just for practice with allocators, and trying to keep as much of stl out of my code so I don't have long compile times.

I'm going to go ahead and say the unhappy thing here.

Your reasons for avoiding the C++ Standard Library are unfounded.


Compile times with your own code that is as rich and correct as the standard library will not be appreciably different on a modern compiler.

Not using the standard library fosters the creation of inferior code that attempts to do the same job. I don't mean that to say you're a bad programmer, but rather that the standard implementations are far more battle-tested than anything you'll be able to replicate in a short time.

You're already using TinyXML2. That proves you can use other people's code fine. So why not use the standard library, even just a tiny part of it like <string>?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I concur with ApochPiQ here, just use std::string, parsing strings is not a fast process to being with, but allocating is likely to the least of your worries honestly. TinyXML2 is not fast, why aren't you re-writing this as it's likely significantly slower then string allocations? Use battle tested systems until you see issues in your profiler that says it's something you need to look into.

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

What OP is doing is just one of the several ways text editors deal with text insertion/deletion, whether using cstrings or not, really doesn't matter:

https://ecc-comp.blogspot.com.br/2015/05/a-brief-glance-at-how-5-text-editors.html

(I need to insert into the middle of files on occasion)

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.

Parse the XML with your TinyXML2 library, convert the content into objects/struct, load the parameters into the objects/struct, change the parameters IN the objects, then write back into a new XML file.

I'm not sure if OP intentions are to just learn about allocators (if it is, choose something less complicated than XML files and dealing with large files) or to use the parsed data for something (if it is, then do the parse, convert to object, convert the object back to text when needed).

What OP is doing is just one of the several ways text editors deal with text insertion/deletion, whether using cstrings or not, really doesn't matter:

https://ecc-comp.blogspot.com.br/2015/05/a-brief-glance-at-how-5-text-editors.html

(I need to insert into the middle of files on occasion)

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.

Parse the XML with your TinyXML2 library, convert the content into objects/struct, load the parameters into the objects/struct, change the parameters IN the objects, then write back into a new XML file.

I'm not sure if OP intentions are to just learn about allocators (if it is, choose something less complicated than XML files and dealing with large files) or to use the parsed data for something (if it is, then do the parse, convert to object, convert the object back to text when needed).

How would you go about inserting into the middle of a large file? I was using vector<char> previously.

Currently, I have my own implementation of vector and an iterator that returns each new line. I allocate 4096 bytes to the stack allocator, and double it if I run out of memory. to make room for inserting in the middle I use memmove. I could use this for a memory mapped file too with some slight changes at creation time.

I don't have anything against using TinyXML2. It's fairly fast as far as XML parsing goes compared to other frameworks. I'm just using my own string implementation for building strings to write out to disk that are valid c++ for Unreal.

I could see switching to the EASTL at some point. It wouldn't be that difficult to replace my implementation with it, I'd just need to do a few find and replaces with vim.

This whole issue will disappear once C++ gets module support on gcc, clang, and msvc through compiler extensions.

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.


There are ways to optimize insertion in the middle of a buffer.

If you're just dealing with a file, I expect you shouldn't be allocating an entire buffer to hold the file and copying it around.

This topic is closed to new replies.

Advertisement