All references to "thread" within this post refer to co-operatively multitasked artificial threads managed within the state machine, that execute "concurrently".
I grew frustrated with porting my NFA regex engine from C, and ditched it in favor of rewriting it entirely. The problem that I have is that in this state machine, there are two types of states: "active" states that match characters and determines whether or not the thread advances to the next state, or dies, and "passive" states (metastates) that don't affect whether or not a thread passes, but have side effects on the thread's execution. An example of a passive state is the "split" state: this state indicates a fork in the execution path with no condition, and the current thread takes one path while spawning another thread that takes the other. Another example is the pair of passive states that indicate the start and end of a looping construct; they both hold a pointer to the looping context that points to the beginning and ending, and has information on the bounds for repetition. When a thread hits the loop start state, it initializes its loop counter for that instance and reads the looping context, and automatically proceeds to the next state. Depending on the loop, it may also spawn a thread that skips the loop if zero iterations are an allowed possibility. When a thread hits the loop finish state, depending on the loop it may spawn a thread to loop again, starting just after the loop start state, and if it is within the bounds for repetition, the thread is permitted to pass on to the states after the loop while the other thread tries looping again, otherwise it dies.
The main problem is, passive states can occur anywhere, and be in any quantity. A passive state may be anywhere, including the first or last states in the machine's execution path. More than one passive state may be attached consecutively. The idea is to process all of the passive states in between each active state; execution is driven by each character of the input string being evaluated one at a time, with all threads moving in lock-step. So, for every character of the input, one active state, along with any number of passive states in between, will be processed to determine whether the thread passes or fails at that step. I'm hoping to ditch my old, hackish method of checking for some state types at the beginning of the step, some after a thread passes its active state, and some after adding the thread back to the thread list. I currently have three "actions": the thread passes, and execution moves to the next thread; the thread fails and is removed from the thread pool; or the thread is "retried", and evaluated again during the same step. For the most part, "retrying" is used when a thread hits a passive state and advances to the next state, but hasn't hit an active state yet, and thus needs to continue executing.
I'd like to handle all state types in one place, but they seem to come at different times in execution, and not all threads have the same amount of passive states between active states due to the different execution paths. I can't come up with a clean solution that would handle all three of these cases:
The first state is a passive "split" state, forking threads to try to match 'a' and 'b' at the same time. This requires checking for passive states at the same time as active states, since it is possible to encounter either one as the first state.
The first state is an active state matching 'a', the second is a passive loop start state, the third is an active state matching 'b', and the fourth is a loop finish state. Here, after either active state, there are passive states that need to be handled.
The first and last states are passive loop states. At the end of the string, the matcher part of the engine seeks out the threads that are at the "match" state, indicating that they successfully matched the pattern. However, if "b" is the input, there is one step. This means that in order to find an active state on the first and only step, and end at the match state, one needs to check for passive states before _and_ after the active state, since there is one step. Perhaps a final flushing step with no character input? It smells hackish, and will likely lead to code duplication.
Does anyone know of a good way to handle this situation?