Ofcourse I did not make this post to point out a reply I made as a comment (it would be nice to have a recent comments section though, so days dont pass before you become aware someone made a comment to an old post). snk_kid sent as a PM a detailed writing that really further clarified some stuff for me. I am going to repost it here because I feel it would be greedy to keep all that well written information to myself.
----------------------------------------
Nonstrict Semantics and Time/Space Complexity
by user snk_kid
About the efficiency issues, generally it's not as bad as some may make it out to be. Some may not realize the number of optimizations a compiler can make with non-strict languages. For instances one such technique implemented in haskell compilers is memoization (not memorization).
When a result is actually evaluated it's *cached* (using the term loosely here) so if you try to obtain the same results again later on it's never re-generated it's just *looked-up* and given. It's probably implemented more intelligently than i'm describing. If you had lazy evaluation without memoization the same results would have to be recomputed everytime which can be quite expensive in some contexts.
Most strict languages with non-strict extensions or manually writing lazy evaluation do not apply memoization you have to write it explicitly yourself and it can be very ugly inelegant to do. In haskell this is all implicit and done by the compiler for you.
Another optimization is deforestation (fusion) which apply to tree or list like structures where intermediate list tree or list results are eliminated. Again haskell compilers typically implements this.
Another related (but not particular to lazy evaluation) is stream fusion related to unfolds basically imagine stringing to together a number of high-order functions like map,zip, etc, etc (the recursive operators) to work on a list. Normally you'd end up with a number of *loops* but with stream fusion none of those operators *loop* they just return immediately until the last operator or at a point which requires evaluation (in the case of haskell).
You basically end up with just one *loop* doing it all the operations, you could think of the *loop* as being refactored out and made implicit in the *stream*, kinda of like a bunch of connected lazy filters.
Again GHC haskell implements stream fusion for those recursive operators internally in the library (it's library based solution with some compiler transformation help if i understood it correctly). Lists get converted to/from Streams, you never know about it when using the standard recursive operators (map,zip, etc, etc).
This can also apply to arrays as well and it is being done as part of GHC haskell's optimizations for nested data-parallel programming extension.
There probably lots of other optimizations applicable which i've now forgot since the last time i read about the subject.
People seem to forget that very high level languages or declarative constructs are generally much more amenable to aggressive optimizations compared to manually emulating those constructs and therefore cannot generally compete with automated methods. So in some sense it really does not make much sense to call non-strict semantics slow because:
A) To achieve the same functionality manually you would end up with something ugly and most definitely even more slower. Think of it like the debate of emulating OO in C as opposed to just using C++ (when you can use it).
B) There is almost always support to make places strict and eagerly evaluated explicitly just as much as the reverse in strict languages. People do it all the time maybe without even realizing it, .NET properties & IEnumerable + yield return are a good example where it's applied often.
Even without optimizations applied there are certain situations or problems where pure lazy solutions are faster, preferred, and some case the only option, once the overhead cost of lazy evaluation is out weighted by what is actually being done, lazy evaluation is faster than eager evaluation. The same thing applies with parallelizing sequential code (in particular data parallelization).
One thing to note is non-strict languages are only feasible in a referentially transparent (i.e purely declarative/functional) setting or those languages where side effects are explicit, localized and controlled (ala haskell monads or clean's uniqueness types) because one of the reasons as i've said earlier is these languages are amenable to aggressive optimizations. So you will mostly likely never see a non-strict version of say C# without making it a referentially transparent subset first and providing/adding explicit & controlled side-effects.
As for analyzing time/space complexity of lazy programs yes it is difficult using traditional methods of analysis because most of the time it's hard to predict when an expression will actually be evaluated.
However there are other methods to analyzing time/space complexity of lazy programs and if i remember correct it's based on amortized analysis. You can read this up from Chris Okasaki's widely known thesis/book Purely Functional Data Structures.