Functional programming and state

Started by
6 comments, last by ToohrVyk 15 years, 5 months ago
This is a question that's been bothering me for some time. I've played with Haskell mainly, and read what I could find in the internet, but I've never found a complete answer. So I'm hoping for some functional-programming gurus here to help me out. Basically, I keep reading that functional programming eliminate the need for states, and thus it solves problems without introducing side-effects. I can understand how that solves some problems, but I'm completely baffled on how it works for others. Does functional programming really solves all problems without using the concept of 'state', or it just solves those that don't need that concept and just doesn't deal with ones that do need it? For instance, let's take the simplest example: We want a function that updates a database. I'm completely lost on how this can work together with 'no state' and 'no side-effects'. Isn't a database a 'state'? And isn't updating/writing/deleting records of a database a side-effect? How would a (pure) functional language handle a simple program of 'Allow the user to update records'? If we use monads, like in Haskell, we essentially re-introduce the concept of 'state' and 'side-effect'. So how is it done without all that? Thanks in advance. If you also could provide an example in a functional language on how what I ask(updating a database) would be achieved, I would be forever grateful :)
Advertisement
Monads do reintroduce state into a functional programming language...but in a very controlled, careful way. It isn't like there's The State Of Things and you'd better be careful if you wanna do state stuff. It's more like, you can do operations on the state before and without having any particular The State to do them to.

Of course, from a pragmatic point of view it does make sense to have a single The State to mess with, because that The State is fifty gigs and is stored on a fault-tolerant server cluster rather than in a few cons cells. That doesn't stop you from getting the benefits of composable state-based operations, though... particularly in conjunction with STM. Clicky

More to say, but I've got a meeting to go to so it'll have to wait. Chew on the paper, though. It's really interesting.
I wouldn't dare to say that "functional programming eliminates the need for state", I'd rather say "functional programming is programming without state (and in finer print: and because you have no state but you do need it, you need to smuggle it back in)". (But then again, some people see functional programming as working with functions as first-class values... but I prefer the stateless definition).

When working functionally (i.e. stateless), calling a function will return a value that depends only on the arguments you passed along. Same arguments, same result. Throw in state, then your function could hide some extra information behind the scenes, and let the result vary (e.g. an RNG). In this case, you could see state as an implicit argument, which is passed along automatically at every function invocation. In a purely functional language, you could make this state explicit by manually passing it along as an extra argument. This is what Haskell's monads do for you.

So, in a functional setting, you can emulate state where necessary using this extra argument. If I'm not mistaken, Mercury (a purely functional language) follows this approach: the "external world" (e.g. a console window where you output text) is inherently stateful. So, every function that interacts with this state receives two extra arguments: the "old state" and the "new state", the latter being a C#-like out parameter. If you want to print stuff to the console, it would look something like this in C#-like syntax:
void main(IO init){  IO state2, state3, state4;  print("Hello", init, out state2);  print("world", state2, out state3);  print("!", state3, out state4);}

Mercury luckily offers a special syntax for this, something like
void main(IO io){  print("Hello", !!io);  print("world", !!io);  print("!", !!io);}


This same technique is used for data structures: you pass along the old state with some extra arguments, and you receive a new state as return value:
List<int> oldLst = ...;List<int> newLst = List<int>.Add(oldLst, 0); // oldLst remains unchanged

In theory, you could do the same for your database.

In the end, whether a language offers state or not is not important: you can emulate state in any functional language (such as shown above). I could write an interpreter for a stateful mini-language in Haskell (which must be possible, Turing completeness), and I'd have state too. Vice versa, you can abstract state away: if a function always returns the same values when given the same arguments, you could call it stateless, even if it uses state inside (e.g. memoization).
An even more intriguing way to introduce state is to use indeterminism by using multiple threads: you can emulate a simple writable variable in Erlang (a purely functional language) by using a "process", which is approximately an object with its own thread:
class Variable{    public Variable(int init)    {        var thread = new Thread( () => MessageHandler(init) );        thread.Start();    }    public void SendMessage(Message message) { EnqueueMessage( message ); }    // Assuming tail call optimisation...    private void MessageHandler(int state)    {        var msg = WaitForNextMessage();        if ( msg is Increment ) MessageHandler(state + 1);        else if ( msg is Decrement ) MessageHandler(state - 1);        else if ( msg is Query ) ...;    }}

Admittedly, this example is not perfect, as it uses a message queue, which is stateful. However, Erlang hides this queue, so the state is abstracted away: you just have to ask for the next message and handle it, as shown in MessageHandler, which is purely functional and is the only part you'd have to write in Erlang.


As I said before, whether a programming language is functional is of no matter, it's whether your code is written in functional style that makes the difference. Java and C#, both imperative (stateful) languages, offer a string data type which is immutable (with good reason) and hence written "in functional style". Ironically, O'Caml's strings are mutable (don't ask me why). Personally, I prefer using a functional style, as it is often simpler and so much more concurrency-friendly. But as always, one can summarize: just use the right tool/style for the right job.
A function that changes the state of a database is, in pure functional style, a function that takes the contents of that database as an argument, and returns the new contents of that database.

It is then up to whatever optimization process you choose to turn that into the correct amount of transactions and queries, without having to haul around the full contents of the database.
Quote:Original post by ToohrVyk
A function that changes the state of a database is, in pure functional style, a function that takes the contents of that database as an argument, and returns the new contents of that database.

It is then up to whatever optimization process you choose to turn that into the correct amount of transactions and queries, without having to haul around the full contents of the database.


I see that now. But, how that works with persistent storage and multiple programs accessing it? Databases,as we know them(like SQL Server,Oracle,MySQL and so on) are supposed to be accessed by different programs at the same or at different times. If a function F of a program P changes some rows on a database table, I suppose it can populate the changes to the program by returning the "new" database. But those changes must be also visible to another function F' of program P' run on a later time, possibly even on another machine. There is a 'hidden' state, which is simply the database data residing on the storage media, isn't it? Or am I wrong? Basically, when we deal with I/O, even simple operations, we have to reintroduce the notion of 'state' one way or the other, don't we?
Alrighty. More thoughts.

First of all, it is possible to build nondestructive databases--that is, databases where, given a time-specific handle, a function can be guaranteed of having an unchanging view of the data. There's SQL extensions that cover this, and Chris Okasaki's thesis goes into the creation of purely functional data structures.

That's sort of a weaselly answer, though, because you do want the database to change and for different computational "threads"'s view of the database to change with that. In order to maximize the benefits of functional programming, though, it's best to put "All Of This Is Serialized In Time" in one box, and the things that actually are done to the database in a different box. That's where monads come in.

The key thing about monads, in this setting, is that a monadic function does not merely represent an operation performed on a monad. Rather, it represents a full block of computation performed in concert with a monad. For instance, you could stick an NPC's per-frame AI actions, which depend on the state of the world and affect the state of the world, into a monadic function, without having to refer to the world state. Later on, that monadic function can be given to the monad and carry out its actions based on whatever state the world happens to be in then.

In a design such as this, the serially-executing "core" of the application is surprisingly small. All it really needs to do is accept things (or, equivalently, ask for things) and do them in order. The meat of the application knows that state exists, but nevertheless only does things in a stateful manner when that actually makes a difference, and the stateful computations delegate to stateless functions for most of their functionality.
Incidentally, Recently filmed: Brian Beckman: The Zen of Stateless State - The State Monad - Part 1
Quote:Original post by mikeman
I see that now. But, how that works with persistent storage and multiple programs accessing it? Databases,as we know them(like SQL Server,Oracle,MySQL and so on) are supposed to be accessed by different programs at the same or at different times. If a function F of a program P changes some rows on a database table, I suppose it can populate the changes to the program by returning the "new" database. But those changes must be also visible to another function F' of program P' run on a later time, possibly even on another machine. There is a 'hidden' state, which is simply the database data residing on the storage media, isn't it? Or am I wrong? Basically, when we deal with I/O, even simple operations, we have to reintroduce the notion of 'state' one way or the other, don't we?
A short while ago, I described on my web site an OCaml implementation of Baker's algorithm, which is an array-like data structure that allows O(1) access and modification while appearing immutable (that is, modification creates a modified copy but leaves the original available). So, in essence, it implements an immutable data structure by means of a mutable data structure.

The problem is that the very nature of functional programming imposes that every execution of the program is atomic: the values manipulated by the program cannot change during the execution, because that would lead to undefined behavior (the program is, for instance, free to cache anything it queries at any time, or recompute it at will, or run it in any order). So, the functional program can, in theory, never "see" modifications applied to the database after the moment it first accesses the database.

Of course, there are also theoretical workarounds for this. The problem here comes from the fact that you represent your database as a single state because this is how the database is represented in imperative programs. However, from the point of view of a pure functional program, a database is a sequence of states, where each state is associated to an unique time. In practice, I would represent a database as an initial immutable object which represents the state of the database when the program starts, and a commit function that takes timed database contents as an argument ("this is how the database should look like at time T") and returns timed database contents ("this is how the database looks like at time T+1"). Since the only stateful element of that function (time) is turned into an argument, that function is purely applicative.

Behind the scenes, the compiler turns every call to "commit" into 1° sending a request to the database to terminate the current transaction and 2° beginning a new transaction. The database system takes care of isolation and locks. The only difficulty here is that a functional program can keep a reference to an older version of the database, which can cause issues if handled improperly because most databases do not support a good modification history.

This topic is closed to new replies.

Advertisement