Back to General and Gameplay Programming

Type of Allocator to use for cstrings?

General and Gameplay Programming Programming

Started by Michael Lojkovic March 26, 2017 03:35 PM

7 comments, last by Oberon_Command 7 years ago

Michael Lojkovic

192

Author

March 26, 2017 03:35 PM

I'm working on a documentation parser that parses xml, for code generation. Previously I was just using cstrings, they're extremely fast, but lead to weird memory errors on occasion. To address this I made my own string struct, and overloaded various operators. I took a slight performance hit since instead of allocating the cstrings on the stack I'm now allocating them on the heap, because of malloc. some of the cstrings hold entire files on occasion, so I want to handle those as well. Ideally, I'll eventually have two allocators and 2 different string classes for handling files and processing lines for writing to files. (I need to insert into the middle of files on occasion)

My question is which type of allocator should I use for handling this use case? I'm guessing a stack allocator that can grow with the size tuned for starting at 16,384, and bumping the size of the stack up to 32,768 and so on if overflow would occur. I probably also need a marker, so I can roll back after writing to files. Is anyone able to help with resources, on how to approach designing allocators? right, now I feel like I could make something that would work on current inputs, but might cause problems if a larger xml file was introduced

Alberth

10,209

March 26, 2017 05:02 PM

What are you doing with strings exactly?

I'd use simple std::string for pieces of text, say a single line.

XML processing is normally done with an XML library, readling and writing files is normally done by streaming it directly from or to disk.

Any particular reason this won't work for you?

Aressera

3,144

March 26, 2017 07:11 PM

Why not memory map the files and then directly read the characters as a C string? That will give the fastest read performance and avoid having to do many string allocations.

Michael Lojkovic

192

Author

March 26, 2017 09:01 PM

Why not memory map the files and then directly read the characters as a C string? That will give the fastest read performance and avoid having to do many string allocations.

Yeah, that actually would work the best, and I could see replacing any uses of fstream with that.

XML processing is normally done with an XML library, readling and writing files is normally done by streaming it directly from or to disk.

I'm using TinyXML2 for processing the actual XML file. I'm just building cstrings from that data for producing actual valid C++ code from the documentation.

What are you doing with strings exactly?

I'm mainly doing this just for practice with allocators, and trying to keep as much of stl out of my code so I don't have long compile times.

ApochPiQ

23,138

March 28, 2017 11:09 PM

I'm going to go ahead and say the unhappy thing here.

Your reasons for avoiding the C++ Standard Library are unfounded.

Compile times with your own code that is as rich and correct as the standard library will not be appreciably different on a modern compiler.

Not using the standard library fosters the creation of inferior code that attempts to do the same job. I don't mean that to say you're a bad programmer, but rather that the standard implementations are far more battle-tested than anything you'll be able to replicate in a short time.

You're already using TinyXML2. That proves you can use other people's code fine. So why not use the standard library, even just a tiny part of it like <string>?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

CrazyCdn

1,442

March 29, 2017 12:15 AM

I concur with ApochPiQ here, just use std::string, parsing strings is not a fast process to being with, but allocating is likely to the least of your worries honestly. TinyXML2 is not fast, why aren't you re-writing this as it's likely significantly slower then string allocations? Use battle tested systems until you see issues in your profiler that says it's something you need to look into.

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

penguinbyebye

379

March 29, 2017 12:43 AM

What OP is doing is just one of the several ways text editors deal with text insertion/deletion, whether using cstrings or not, really doesn't matter:

https://ecc-comp.blogspot.com.br/2015/05/a-brief-glance-at-how-5-text-editors.html

(I need to insert into the middle of files on occasion)

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.

Parse the XML with your TinyXML2 library, convert the content into objects/struct, load the parameters into the objects/struct, change the parameters IN the objects, then write back into a new XML file.

I'm not sure if OP intentions are to just learn about allocators (if it is, choose something less complicated than XML files and dealing with large files) or to use the parsed data for something (if it is, then do the parse, convert to object, convert the object back to text when needed).

Michael Lojkovic

192

Author

April 14, 2017 12:41 AM

What OP is doing is just one of the several ways text editors deal with text insertion/deletion, whether using cstrings or not, really doesn't matter:

https://ecc-comp.blogspot.com.br/2015/05/a-brief-glance-at-how-5-text-editors.html

(I need to insert into the middle of files on occasion)

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.

Parse the XML with your TinyXML2 library, convert the content into objects/struct, load the parameters into the objects/struct, change the parameters IN the objects, then write back into a new XML file.

I'm not sure if OP intentions are to just learn about allocators (if it is, choose something less complicated than XML files and dealing with large files) or to use the parsed data for something (if it is, then do the parse, convert to object, convert the object back to text when needed).

How would you go about inserting into the middle of a large file? I was using vector<char> previously.

Currently, I have my own implementation of vector and an iterator that returns each new line. I allocate 4096 bytes to the stack allocator, and double it if I run out of memory. to make room for inserting in the middle I use memmove. I could use this for a memory mapped file too with some slight changes at creation time.

I don't have anything against using TinyXML2. It's fairly fast as far as XML parsing goes compared to other frameworks. I'm just using my own string implementation for building strings to write out to disk that are valid c++ for Unreal.

I could see switching to the EASTL at some point. It wouldn't be that difficult to replace my implementation with it, I'd just need to do a few find and replaces with vim.

This whole issue will disappear once C++ gets module support on gcc, clang, and msvc through compiler extensions.

Oberon_Command

6,371

April 14, 2017 12:53 AM

But, why nobody is addressing that this isn't a good approach? Of so many approaches, that one can be a real bummer with large files, because you have to 1) reallocate the memory 2) move all the content on the right X bytes 3) place your new bytes in the middle. This will be slower and slower.

There are ways to optimize insertion in the middle of a buffer.

If you're just dealing with a file, I expect you shouldn't be allocating an entire buffer to hold the file and copying it around.

Type of Allocator to use for cstrings?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Type of Allocator to use for cstrings?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines