Sign in to follow this  
Michael Lojkovic

Type of Allocator to use for cstrings?

Recommended Posts

I'm working on a documentation parser that parses xml, for code generation. Previously I was just using cstrings, they're extremely fast, but lead to weird memory errors on occasion. To address this I made my own string struct, and overloaded various operators. I took a slight performance hit since instead of allocating the cstrings on the stack I'm now allocating them on the heap, because of malloc. some of the cstrings hold entire files on occasion, so I want to handle those as well. Ideally, I'll eventually have two allocators and 2 different string classes for handling files and processing lines for writing to files. (I need to insert into the middle of files on occasion)

My question is which type of allocator should I use for handling this use case? I'm guessing a stack allocator that can grow with the size tuned for starting at 16,384, and bumping the size of the stack up to 32,768 and so on if overflow would occur. I probably also need a marker, so I can roll back after writing to files. Is anyone able to help with resources, on how to approach designing allocators? right, now I feel like I could make something that would work on current inputs, but might cause problems if a larger xml file was introduced

Share this post


Link to post
Share on other sites

What are you doing with strings exactly?

I'd use simple std::string for pieces of text, say a single line.

XML processing is normally done with an XML library, readling and writing files is normally done by streaming it directly from or to disk.

Any particular reason this won't work for you?

Share this post


Link to post
Share on other sites

Why not memory map the files and then directly read the characters as a C string? That will give the fastest read performance and avoid having to do many string allocations.

 

Yeah, that actually would work the best, and I could see replacing any uses of fstream with that.
 

XML processing is normally done with an XML library, readling and writing files is normally done by streaming it directly from or to disk.

 

I'm using TinyXML2 for processing the actual XML file. I'm just building cstrings from that data for producing actual valid C++ code from the documentation.

What are you doing with strings exactly?

 

I'm mainly doing this just for practice with allocators, and trying to keep as much of stl out of my code so I don't have long compile times.

Share this post


Link to post
Share on other sites

 I concur with ApochPiQ here, just use std::string, parsing strings is not a fast process to being with, but allocating is likely to the least of your worries honestly.  TinyXML2 is not fast, why aren't you re-writing this as it's likely significantly slower then string allocations?  Use battle tested systems until you see issues in your profiler that says it's something you need to look into.

Share this post


Link to post
Share on other sites

What OP is doing is just one of the several ways text editors deal with text insertion/deletion, whether using cstrings or not, really doesn't matter:

https://ecc-comp.blogspot.com.br/2015/05/a-brief-glance-at-how-5-text-editors.html

(I need to insert into the middle of files on occasion)

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.

Parse the XML with your TinyXML2 library, convert the content into objects/struct, load the parameters into the objects/struct, change the parameters IN the objects, then write back into a new XML file.

 

I'm not sure if OP intentions are to just learn about allocators (if it is, choose something less complicated than XML files and dealing with large files) or to use the parsed data for something (if it is, then do the parse, convert to object, convert the object back to text when needed).

Edited by felipefsdev

Share this post


Link to post
Share on other sites

What OP is doing is just one of the several ways text editors deal with text insertion/deletion, whether using cstrings or not, really doesn't matter:

https://ecc-comp.blogspot.com.br/2015/05/a-brief-glance-at-how-5-text-editors.html

 

 

(I need to insert into the middle of files on occasion)

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.

Parse the XML with your TinyXML2 library, convert the content into objects/struct, load the parameters into the objects/struct, change the parameters IN the objects, then write back into a new XML file.

 

I'm not sure if OP intentions are to just learn about allocators (if it is, choose something less complicated than XML files and dealing with large files) or to use the parsed data for something (if it is, then do the parse, convert to object, convert the object back to text when needed).

 

How would you go about inserting into the middle of a large file? I was using vector<char> previously.

Currently, I have my own implementation of vector and an iterator that returns each new line. I allocate 4096 bytes to the stack allocator, and double it if I run out of memory. to make room for inserting in the middle I use memmove. I could use this for a memory mapped file too with some slight changes at creation time.

I don't have anything against using TinyXML2. It's fairly fast as far as XML parsing goes compared to other frameworks. I'm just using my own string implementation for building strings to write out to disk that are valid c++ for Unreal.

I could see switching to the EASTL at some point. It wouldn't be that difficult to replace my implementation with it, I'd just need to do a few find and replaces with vim.

This whole issue will disappear once C++ gets module support on gcc, clang, and msvc through compiler extensions.

Share this post


Link to post
Share on other sites

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.


There are ways to optimize insertion in the middle of a buffer.

If you're just dealing with a file, I expect you shouldn't be allocating an entire buffer to hold the file and copying it around. Edited by Oberon_Command

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this