Jump to content
  • Advertisement
Sign in to follow this  
  • entries
    43
  • comments
    51
  • views
    32372

Creating A BBCode System

Sign in to follow this  
Colin Jeanne

353 views

I havent been coding too much lately. School just burns every desire I have to work on my own projects unfortunately. This piece of code is actually one of the last things I worked on, finishing it last month.

Creating a BBCode System In PHP


Earlier posts were about the creation of a CMS and blogging system using PHP. One of the original goals of the CMS was to output well-formed XHTML and one of the requirements for the blog was to allow user comments. Unfortunately these two ideals can come into opposition: on the one hand I want to give my visitors a way of expressing themselves in their posts by creating links, defining emphasis, etc. but on the other I need to make sure that the code that they input does not break my page's well-formedness. To rectify this problem I've decided to disable XHTML input entirely and have instead replaced it with a BBCode-like system which I have called Blog Code.

Blog Code Syntax


For the sake of familiarity I've decided that Blog Code syntax would be similar to HTML syntax. However, instead of delimiting tags with < and > I will use [ and ] to match other BBCode systems.

Blog Code will comprise of different tags which all have zero or more attributes and some type of content. A sample tag to display an image might look like


[img alt="This is my alt text"]http://www.example.com/img.png[/img]


where 'alt' is an attribute with a value of "This is my alt text" and 'http://www.example.com/img.png' is the content of the tag.

Further, because I would like to automatically encapsulate my paragraphs in XHTML's p tag I need to define which tags can exist in a paragraph. This gives me two types of tags: inline tags, which can exist in a paragraph, and block tags, which can not exist in a paragraph.

Implementation


In order to help me manage the different types of tags that exist I've created a class, BlogCode, which will allow me to register new tags, find paragraphs, and apply all or a subset of the registered tags.

Tag Registration


When a tag is registered I simply save the name of the tag and a callback function which will handle the tag to an array within the BlogCode class.


class BlogCode {
var $taginfo = array();

// Registers a tag name with a callback to handle this tag
function RegisterCode($tagname, $callback, $display = 'inline') {
// The tag name must be just alphanumeric
if (preg_match('/^\w+$/', $tagname) == false)
return false;

$this->taginfo[$tagname] = array($callback, $display);
}
}



I've restricted my tag names to be only composed of alphanumerics and before a tag can be registered its name must be validated to ensure no wacky tags make it through.

Tag Application


To actually apply the tags we must be able to search for them. I've come up with this regex to hunt down any tag


/\[(\w+)( [^\]]*)?\](.*)\[\/\\1\]/sU


This describes a tag which begins with a '[' and immediately after has one or more alphanumeric characters (the tag's name). Optionally afterward there may be a space followed by anything other than a ']' (the tag's attributes). Next there will be a ']' followed by anything and then finally ending in '[\' followed by the tag's name and ']'. The search is not greedy so it will end a tag at the first instance of its ending tag rather than the last instance of its ending tag. That is this


[em]Test test test[/em] bla bla bla test test test[/em]


Should be rendered as

Test test test bla bla bla test test test[/em]

and not as

Test test test[/em] bla bla bla test test test

In order to make a regex which will find only select tags we simply need to restrict the name of the tags. To do this I replace (\w+) with (tagname 1|tagname 2|...|).

Handling The Attributes


Notice from the above that we dont really care what is in the attributes section of each tag. However, it would be nice if we could take these attributes and convert them into a name/value pairing for easy handling. To do this we'll define a new syntax for the attributes.

Each attribute will have a name composed of only alphanumerics. The values of the attributes must all be quoted to make parsing simpler and must not contain either '"' or ']', again to make parsing simpler. This is the regex I've come up with


/(\w+)="([^"\]]+)"/


So, a valid attribute would look like

name="value"

Coding It In PHP


This is the implementation I have for tag application and attribute parsing


global $blog_code;

class BlogCode {
var $taginfo = array();

// Registers a tag name with a callback to handle this tag
function RegisterCode($tagname, $callback, $display = 'inline') {
// The tag name must be just alphanumeric
if (preg_match('/^\w+$/', $tagname) == false)
return false;

$this->taginfo[$tagname] = array($callback, $display);
}

// Apply these tags on a body of text
function ApplyCodes($str) {
return preg_replace_callback("/\[(\w+)( [^\]]*)?\](.*)\[\/\\1\]/sU",
'BlogCodeCallback', $str);
}

// Applies only a select subset of the tags
function ApplySelectCodes($str, $tags) {
$namelist = array_shift($tags);

foreach ($tags as $tag)
$namelist .= "|$tag";

return preg_replace_callback("/\[($namelist)( [^\]]*)?\](.*)\[\/\\1\]/sU",
'BlogCodeCallback', $str);
}

// Breaks up the attribute list into an associative array where the
// attribute names are the keys and the associated values are the values
function ConvertToArray($attrs) {
$attrarray = array();

preg_match_all('/(\w+)="([^"\]]+)"/', $attrs, $matches);

$keys = $matches[1];
$vals = $matches[2];

for ($i = 0; $i < count($keys); $i++)
$attrarray[$keys[$i]] = $vals[$i];

return $attrarray;
}
}

$blog_code = new BlogCode;

// Routes each match to the callback associated with it
function BlogCodeCallback($matches) {
global $blog_code;

$tagname = $matches[1];
$attrs = $blog_code->ConvertToArray($matches[2]);
$content = $matches[3];

if (isset($blog_code->taginfo[$tagname])) {
$str = call_user_func($blog_code->taginfo[$tagname][0], $tagname,
$attrs, $content);

if ($str !== false)
return $str;
}

return $matches[0];
}



The new bits of code are the applicantion functions, the conversion of the attributes to an associative array, and the main tag callback.

The application functions, ApplyCodes() and ApplySelectCodes() work exactly as I described above. They search for the tags or a subset of the tags within a string. When a tag is found preg_replace_callback() calls BlogCodeCallback which grabs the tagname, attributes, and content of the tag before sending those values to the callback associated with that tag.

ConvertToArray() takes in an attribute string and searches for attribute/value pairs and turns them into an associative array.

Finally, since BlogCode is the only entity that should manage the tags I've made a global, $blog_code, which will act as the one instance for this class. This was necessary as preg_replace_callback required that BlogCodeCallback be a non-member function.

Finding Paragraphs


By far, the most difficult part of this project was finding out where paragraphs being and end. After many different tries I came up with this:

It's easy to see paragraphs being broken up by two newlines but that cannot account for block level tags. My solution was to first split the text along block level tags and then apply the simplistic view of what separates paragraphs. Finally, I paste the different sections of text back together using the block level tags that were originally between them.

Add this member to BlogCode


// Finds each paragraph and wraps it in

tags
function BlockParagraphs($str) {
$blocktags = array();

foreach ($this->taginfo as $tagname => $value) {
if ($value[1] != 'inline')
array_push($blocktags, $tagname);
}

$namelist = array_shift($blocktags);

foreach ($blocktags as $blocktag)
$namelist .= "|$blocktag";

// Break up the text into array elements separated by block-level tags
$pararray = preg_split("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU",
$str);

// Capture all of the block level tags (in order)
preg_match_all("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU", $str,
$delims);

// Ignore the names of the tags
$delims = $delims[0];

// Wrap each paragraph in the tags with p tags
$paras = preg_replace("/(.+)(?:\n\n|\r\r|\r\n\r\n|$)/sU", "

\\1

"
,
$pararray);

// Remove empty p tags
$paras = preg_replace("/

\s*<\/p>/U", '', $paras);

$output = '';

// Interleave the block-level elements into the text
foreach ($paras as $para)
$output .= $para . array_shift($delims);

// Put the rest of the block-level elements into the text
$output .= implode('', $delims);

return $output;
}



Using the above code it becomes necessary to call this member on a piece of text before applying any of the tags.

The Final Code


This is my final PHP file, with some sample tag definitions at the bottom


/************************************************************************
*
* Title: BlogCode Class
* Author: Colin Jeanne (http://colinjeanne.net)
* Date: August 23, 2005
*
* Description:
* A class that represents a BBCode-like formatter
*
************************************************************************/


global $blog_code;

class BlogCode {
var $taginfo = array();

// Registers a tag name with a callback to handle this tag
function RegisterCode($tagname, $callback, $display = 'inline') {
// The tag name must be just alphanumeric
if (preg_match('/^\w+$/', $tagname) == false)
return false;

$this->taginfo[$tagname] = array($callback, $display);
}

// Finds each paragraph and wraps it in

tags
function BlockParagraphs($str) {
$blocktags = array();

foreach ($this->taginfo as $tagname => $value) {
if ($value[1] != 'inline')
array_push($blocktags, $tagname);
}

//XXX There must be a better way to do this
$namelist = array_shift($blocktags);

foreach ($blocktags as $blocktag)
$namelist .= "|$blocktag";

// Break up the text into array elements separated by block-level tags
$pararray = preg_split("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU",
$str);

// Capture all of the block level tags (in order)
preg_match_all("/\[($namelist)(?: [^\]]*)?\].*\[\/\\1\]/sU", $str,
$delims);

// Ignore the names of the tags
$delims = $delims[0];

// Wrap each paragraph in the tags with p tags
$paras = preg_replace("/(.+)(?:\n\n|\r\r|\r\n\r\n|$)/sU", "

\\1

"
,
$pararray);

// Remove empty p tags
$paras = preg_replace("/

\s*<\/p>/U", '', $paras);

$output = '';

// Interleave the block-level elements into the text
foreach ($paras as $para)
$output .= $para . array_shift($delims);

// Put the rest of the block-level elements into the text
$output .= implode('', $delims);

return $output;
}

// Apply these tags on a body of text
function ApplyCodes($str) {
return preg_replace_callback("/\[(\w+)( [^\]]*)?\](.*)\[\/\\1\]/sU",
'BlogCodeCallback', $str);
}

// Applies only a select subset of the tags
function ApplySelectCodes($str, $tags) {
//XXX There must be a better way to do this
$namelist = array_shift($tags);

foreach ($tags as $tag)
$namelist .= "|$tag";

return preg_replace_callback("/\[($namelist)( [^\]]*)?\](.*)\[\/\\1\]/sU",
'BlogCodeCallback', $str);
}

// Breaks up the attribute list into an associative array where the
// attribute names are the keys and the associated values are the values
function ConvertToArray($attrs) {
$attrarray = array();

preg_match_all('/(\w+)="([^"\]]+)"/', $attrs, $matches);

$keys = $matches[1];
$vals = $matches[2];

for ($i = 0; $i < count($keys); $i++)
$attrarray[$keys[$i]] = $vals[$i];

return $attrarray;
}
}

$blog_code = new BlogCode;

// Routes each match to the callback associated with it
function BlogCodeCallback($matches) {
global $blog_code;

$tagname = $matches[1];
$attrs = $blog_code->ConvertToArray($matches[2]);
$content = $matches[3];

if (isset($blog_code->taginfo[$tagname])) {
$str = call_user_func($blog_code->taginfo[$tagname][0], $tagname,
$attrs, $content);

if ($str !== false)
return $str;
}

return $matches[0];
}

function inlineCallback($tagname, $attrs, $content) {
global $blog_code;

return "<$tagname>" .
$blog_code->ApplySelectCodes($content,
array('strong', 'em', 'link')) .
"";
}

$blog_code->RegisterCode('strong', 'inlineCallback');
$blog_code->RegisterCode('em', 'inlineCallback');

function imgCallback($tagname, $attrs, $content) {
if (isset($attrs['alt']))
return '<span"' . $attrs['alt'] . '" src="' . $content . '" />';
else
return "$content\" />";
}

$blog_code->RegisterCode('img', 'imgCallback', 'block');

function linkCallback($tagname, $attrs, $content) {
global $blog_code;

if (isset($attrs['href'])) {
return '"' . $attrs['href'] . '"
>' .
$blog_code->ApplySelectCodes($content,
array('strong', 'em', 'img')) .
'
';
} else {
return "$content\">$content";
}
}

$blog_code->RegisterCode('link', 'linkCallback');

function codeCallback($tagname, $attrs, $content) {
return '

class="blog_code">' . htmlentities($content) . '
';
}

$blog_code->RegisterCode('code', 'codeCallback', 'block');
?>

Sign in to follow this  


0 Comments


Recommended Comments

There are no comments to display.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!