1) A NURBS can be easily decomposed into a set of Bezier curves with a specified continuity between consecutive curves. That means that a NURBS surface patch can be decomposed into a set of Bezier patches. It does work both ways via knot insertion and knot deletion. (Aside: knot deletion can't always be performed and yield the same curve/surface, so you'll have to know when you can do it)

2) The knot vector affords properties that you don't get with the Bezier formulation. The biggest problems with Beziers are local control and degree increase. For example, adding more control points to a Bezier curve increases your control of the curve shape, but it also increases the degree of the polynomial, increasing your computation time. As well, if you move a Bezier control point, the whole curve changes. The knot vector of the NURBS allows for a collection of multiple Bezier curves of the same degree and the order of the NURBS with local control, because each NURBS control point only influences a certain part of the whole NURBS, not all of it (local control).

Mathematically, the knot vector specifies a parameter interval over which the recursive basis functions act. A better way to picture this is to visualize the knot vector as specifying the start and end parameter values of the constituent Bezier curves. For example, the knot vector [0 0 0 0 1 2 2 2 2] for a 3rd-order NURBS says that the NURBS consists of 2 Bezier curves, one on the interval (0,1) and one on the interval (1,2). Furthermore, the knot vector tells us what the continuity of the curves are. In this case, the curves have C2 continuity at the parameter value 1. If the knot vector was [0 0 0 0 1 1 2 2 2 2], we'd still have 2 Bezier curves at intervals (0,1) and (1,2), but the continuity between the curves would only guaranteed to be C1 (however, if we inserted this knot, then the curves would still be C2). Middle knots (i.e. not the start or end knot values) can have a maximum of multiplicity "n" in the knot vector, and the minimum continuity of the Bezier curves at those parameter values is C^(n-k), where n is the order of the NURBS and k is the multiplicity of the knot. The properties of the knot vector also explain your question (4). Simply put, the knot vector adds a measure of control you don't have when working with Bezier curves.

3) The calculations for finding a NURBS point and normal are certainly more complicated than finding ones for Bezier surface patches. There are techniques for evaluating them more quickly, but you can always decompose the NURBS into Bezier surface patches and get points and normals that way. It's probably about the same amount of work either way.

4) In addition to the info on the knot vector, since the starting parameter value has multiplicity 4 (i.e. since the NURBS in this example is 3rd-order, it's n+1), then we know this NURBS curve passes through the starting control point. The end parameter value is also multiplicity 4, so the curve passes through the end control point. The knot vector doesn't necessarily have to have those "end conditions". The knot vector for a cubic NURBS can be [0 1 2 3 4 5 6 7 8], which means that the curve doesn't pass through the ends, consists of 2 Bezier curves at intervals (3,4) and (4,5) with continuity C2 at t=4.

In my opinion, Bezier surface patches are nicer to work with, but unless you know exactly how to "stitch" the Bezier patches together via control point placement, NURBS patches are probably what you want.