Jump to content
  • Advertisement
Sign in to follow this  
skulldrudgery

Word-wrapping/line breaking algorithm

This topic is 4814 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey fellas. I hope this is the right place. I want to make a function to output a string of text (just in the console for now) and wrap the string to some arbitrary boundaries that I set. I am looking for some hints on an algorithm. So far what I have come up with is: Tokenize the string by strtok-ing (or std::string equivalent) against a string of "breaking characters" (space, comma, dash/hyphen, etc.). Add the tokens together, one by one. Check to see if after each token the line goes beyond the boundary. If so, insert a linebreak before the token. Otherwise continue adding them. Is this (very incomplete) algorithm good/horrible and simple/convoluted? How should I preserve the whitespace (any of the breaking characters, actually)? Can someone help fill in the holes? Any other potential snafus or pitfalls I should be looking out for? Thanks in advance.

Share this post


Link to post
Share on other sites
Advertisement
If you want to wrap the string at N characters, why not just start by looking at position N, and working backwards until you encounter a breaking character? That would be faster than tokenizing the entire string, and whitespace would be preserved.

Share this post


Link to post
Share on other sites
I didn't even think of thinking backwards :D

Ok, so I'll check N+1. If it's a breaking character, I replace it with a line break, because I don't want leading spaces on newlines. Then I check N, if it is not a breaking character, I'll go backward looking for the first breaking character, and replace it with a line break.

Now that I think of it, I should probably consider commas and hyphens to be a part of the word. Are spaces sufficient to distinguish between words?

Share this post


Link to post
Share on other sites
Generalize your function to accept a set (specified as a std::string of unique characters) of characters that will be recognized as delimiters. You can pass that directly to std::string searching functions such as 'find_last_of'. :)

One pathological case you may want to think about is when a single word will not fit on a line by itself. A naive backwards-searching algorithm is going to get stuck in an infinite loop if it hits one of these. How to fix the problem depends on what you want to happen for these words.

Also, instead of mutating the passed-in string, you might instead consider any of:

a) returning a mutated copy of the string
b) returning a vector of "lines"
c) returning some abstract indication of how the string needs to be cut up, and then using that to do substringing work later.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by skulldrudgery
Are spaces sufficient to distinguish between words?
Do you only want to add line-breaks at word boundaries? What about words like "line-break"? One could split it after the hyphen of course. And you could add a hyphenizer to your program which would automatically hyphenize all words.

You could go a step forward and add a cost function for how good the word-warpping is. For example, it's probably worse to have 10 trailing whitespace on one line and 0 trailing whitespace on another, than to have 6 trailing whitespace on one line and 6 on another. Even though the latter alternative has 12 extra whitespace and the former only 10. And words broken by hyphens aren't so easy to read, so you could add some penalty for hyphenizing as well. If you know dynamic programming, constructing the line-breaks for a paragraph can be done in linear time easily.. It certainly gets more convoluted than what you initially suggested, but also better :)

Share this post


Link to post
Share on other sites
I don't know if it's part of the std library, but I've seen some work where entire strings can be searched and referenced by word, allowing for "stripping" of certain words and also useful in chomping at certain points. If you only have C-style strings (i.e. arrays of cha's) then I suppose you could check N for a letter or whitespace or break-point, given that a whitespace or break-point exist, you merely chomp the rest of the line and put it on a new one before printing the buffer, if it is a letter then you just run an: for (i = 100; i < whitespace; i--). That should solve your problem nicely, since most words aren't 100 characters long the given parameters are more than enough, but you could also shorten it and hyphenate at a certain distance from N.

I imagine that if you wanted to, you could achieve "justified" text display by using hyphenation is your formula encounters a letter and simple break operation for whitespace or breakpoint characters. Anyway, the entire process should be fairly simple to code in, once you figure out how exactly you want the mechanics to operate.

Vopisk

Edit: For the record, you can treat hyphens as a break-point character and that will allow for already hyphenized words to be wrapped properly.*

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!