OOP and DOD

Started by
24 comments, last by Norman Barrows 7 years, 11 months ago

Thanks so much everyone. From what I gathered it seems that DOD, or functional programming,


Not quite. Again, you have your terms mixed up. "Functional programming" is something else entirely.

I would just call it DoD. As far as I know, it doesn't have any synonym. Its more vocal proponents might just call it "engineering," but not everyone would agree with that claim.
Advertisement

Thanks so much everyone. From what I gathered it seems that DOD, or functional programming, is faster than OOP but OOP is better for more complex behaviors. I often hear that using inheritance, abstract classes, polymorphism and such is slower due to v-table searches, but does that cause significant reduction of performance (to the point where say a player would notice)?

I like the idea of using abstract data and objects in C++ because it seems like a good way to organize code. On the other hand, I don't want to make a code base that is terribly inefficient.

We cant tell you which would be better or faster because we dont know what your code will do or what the data will look like. My recommendation is to arrange your code in whatever way makes the most sense to you. While you're doing that you will probably learn a lot, and as a result when you're done you will probably think "if I were to do it again I'd do it all differently"... which is a normal thing. But, you cant really jump to that without first learning what you learned the first time around.

Of course that's not to say that you cant learn more about how the hardware works before you start coding. DOD is all about arranging data so that your code will operate on it optimally given specific ways that the hardware works. So, I'd recommend learning more about how memory caches and how the prefetcher works. But then again, I kind of think that you're worrying too much about optimizations. Just because your code follows DOD instead of OOD doesnt mean it will be any faster, any easier to write or maintain, or any more robust. In fact if you blindly try to go that route you might end up with just the opposite.

From what I gathered it seems that DOD, or functional programming, is faster than OOP but OOP is better for more complex behaviors.

I'm not sure how you gathered that.

Let's try again.

Functional programming is something entirely different. That is the design of the language, and is a spectrum. On one side is 'functional programming', on the other side is 'imperative programming' or 'procedural programming.' The family of languages including C++, Java, C#, etc., are all imperative languages. Contrast this with functional languages like SQL where you tell it what you want (select whatever from tables where this=that) and the system determines which algorithms and internal execution patterns to use.

Data Oriented programming and Object Oriented programming are not at odds and are not necessarily faster or slower. Data Oriented is a concept, meaning your implementation takes consideration of the underlying data and its flow through the machine. Object oriented is also a concept, meaning you have clusters of functionality around a single responsibility operating on a cluster of data.

OOP is clusters of behaviors around a blob of data. General guidance is to have a single responsibility for a cluster. If you are only considering the behaviors and take no thought about the underlying machine, it is possible to write code that does not perform well, but that is true of any situation where you are taking no thought about the underlying machine.

DOD means designing around smooth flow of data. General guidance is to have long, continuous strands of data that flow through the cache and processing around a predictable stride. If you are only considering flow of data and take no thought about interplay of behavior, it is possible to write code that cannot easily handle complex behavior, but that is true of any situation where you do not think in terms of systemic complexity.

Both of them are fully independent of each other. Making your tasks into tighter clusters or looser clusters (object oriented) is completely independent of making your data more or less memory/cache/cpu friendly (data oriented). You can create code that follows both object oriented concepts and data oriented concepts. You can write code that follows only one or the other. You can write code that follows neither.

One of the reasons that DOD is still relatively unknown outside of certain areas (game programming, high-performance computing) is because it solves a very specific problem: memory access bound performance.

But for the vast majority (IMHO) of software written today, CPU/memory bound performance is an order of magnitude less important than the much lower bandwidth issues (data access, network access, etc).

Most "typical" business software spends its time waiting for database queries or REST APIs to complete. If DOD improves your algorithm even by 1000%, that's not going to help much if the total time spent waiting for process x to complete is 90% dependent on a high latency process.

I'm not arguing against DOD; it's very good for it's intended purpose. I'm simply attempting to explain why it's not more widely known.

But IO-bound software struggling between external systems and memory requires the same optimization techniques and principles as software struggling between memory and CPU: read data sequentially or at least in large pages, read data only once (if possible, never), organize data so you don't waste bandwidth reading unneeded data because it's interspersed with data you need, don't waste time and/or space storing redundant information that can be recomputed inexpensively from a small working set of data, and so on.

Apparently different techniques, like normalizing the structure of a relational database and organizing data structures in memory as "structures of arrays" actually serve the same purpose in the same way.

Omae Wa Mou Shindeiru

1) Is your problem mainly about optimization, about running a simulation as fast as possible? If so, DOD might be the best way for you to go.

2) Do you have smaller numbers of objects that interact in complex ways with each other during an update? OOP is probably better for you. But,

Do you have large numbers of objects that dont interact with each other? DOD is probably better and faster.

...a couple of problems with OOP:

1. Inheritance abuse (including CPU costs of virtual function calls although generally that is an optimization).

2. Cache wastage through composition abuse and inheritance.

3. Destructors, constructors, member functions, member operator overloading, etc. leading more functional code writing instead of OOP.

... Let me expand a bit on this -- DOD is really the art of wringing utmost performance from a set of hardware that has specific, real-world characteristics -- machines have a word-size, a cache-line size, multiple levels of cache all with different characteristics and sizes, it has main memory, disk drives, and network interfaces all of which have specific bandwidths and latencies measurable in real, wall-clock time. ...

If performance is critical, DOD is the only reasonable starting point today. Period. End of Story ...

From what I gathered it seems that DOD, or functional programming, is faster than OOP but OOP is better for more complex behaviors.

I'm not sure how you gathered that ....

Data Oriented programming and Object Oriented programming are not at odds and are not necessarily faster or slower. ...

OOP is clusters of behaviors around a blob of data. ...

DOD means designing around smooth flow of data. General guidance is to have long, continuous strands of data that flow through the cache and processing around a predictable stride.

I think I'm even more confused now than I was before. People, and even you, are saying that DOD is more efficient, but at the same time you're saying that it's not? It is but it isn't? :(

In my studies, it was said that computationally speaking OOP has a larger footprint because data is spread throughout memory rather than being stored in one place. It would be more resource intensive to perform searches on that data than a DOD approach would lend itself to. Was that wrong? You mention that DOD is an approach that prefers continuous strands of data, which would be easier and quicker to handle than v-table searches. I don't understand. Is it like using a std::vector (continuous memory, faster) versus a std::list (not continuous, slower)?

If it won't cause a noticeable bog in a 2d game, I'd like to use OOP. At the same time, I want to develop for mobile, where performance and resource usage appear to be quite important. :) It might be better to share examples of specific subsystems in a typical game that may work better with OOP or DOD. For example, I heard that physics simulation might work better with DOD.

I think I'm even more confused now than I was before. People, and even you, are saying that DOD is more efficient, but at the same time you're saying that it's not? It is but it isn't? :(


It seems like you're looking for one of us to state that one of these is always faster. That is not the case - we would be spouting dogma if we said that. It really depends on what you're trying to do, your data is (same thing), and where your latencies are. DoD can be and usually is more efficient, but it doesn't always result in more efficient code since not all algorithms can be arranged to operate on streams of contiguous data.

You should also stop thinking of DoD and OOD as opposites. Sometimes the data layout in a DoD solution is equivalent to the one in the OOP solution, meaning that the DoD and OOP solutions are the same thing, for one thing. Remember, OOP is fundamentally a modelling tool that focuses on maintaining state invariants. You can use OOP to implement a data-oriented design, if not on the usual level of granularity of typical OOD.

Is it like using a std::vector (continuous memory, faster) versus a std::list (not continuous, slower)?


That's a good example, yes. But I am reluctant to claim that every data structure can be represented efficiently with a vector (or several vectors) or something like it. Many or even most can be, but that doesn't mean there aren't cases where you need to do something different. Same thing applies to DoD.
I think I'm even more confused now than I was before. People, and even you, are saying that DOD is more efficient, but at the same time you're saying that it's not? It is but it isn't? :(

Efficiency in code is a strange thing. Typically the inefficiencies are not what you expect. Generally the first time a program is analyzed for performance all kinds of weird stuff shows up. You might discover billions of calls to string comparison, or billions of iterations through some loop deep-down in the system.

It is rare for performance problems to be exactly you expect them to be, unless you've been doing performance analysis for years, in which case occasionally you can guess right.

Data oriented design is more efficient because you wrote and tested for that specific thing. You spent time and effort ensuring that the system will efficiently travel across the data. Perhaps you have a half million particles and you build a particle system, your data oriented design will ensure that the CPU streams quickly and efficiently over those half million particles, taking advantage of cache effects and whatever prefetch commands are available on the system.

It takes a great deal of testing to ensure something that claims to be data oriented really is.

In my studies, it was said that computationally speaking OOP has a larger footprint because data is spread throughout memory rather than being stored in one place. It would be more resource intensive to perform searches on that data than a DOD approach would lend itself to. Was that wrong?

Yes, that is a poor generalization.

Object oriented programming is based around clusters of objects and operations. There is typically no significant difference in memory footprint. Data can potentially be scattered across memory if that is how the programmer wrote it. This is not a problem by itself if the operations are naturally scattered. However, if operations are sequential, and also if memory cannot be linearly traversed, then your code might not to take advantage of certain benefits from locality of data. Note there are several conditions involved there.

Data oriented development means actively seeking out those conditions and intentionally taking action to ensure when you perform sequential operations the memory is traversed in a hardware-friendly way.

Programmers who follow object oriented rules can completely violate the rules of data oriented development. They can also closely follow the rules of data oriented development. The two are unrelated.

You mention that DOD is an approach that prefers continuous strands of data, which would be easier and quicker to handle than v-table searches. I don't understand.

It has nothing to do with vtables.

Going back to a particle system example:

A naive programmer might make a particle object. They'll give the particle a position, mass, velocity, and several other variables. Then they will create an array of 500,000 particle objects. When they run the code, they will iterate over each object, calling 500,000 functions. Each function will update the object, then return. They'll provide many different functions to manipulate the particles, and whenever they work with the system, call each function 500,000 times.

A more savvy programmer might make a class of particles. They will create an array of positions, an array of masses, an array of velocities, and a way to set the number of particles, taking care to ensure the processing works on an interval that steps one cpu cache line at a time. Generally working within the cache you only pay the cost for the initial load, so if you can fit 4 in the cache at once you pay for one, the other three are effectively free. The programmer can then set the number to 500,000 particles. Then they will make a single function call which will traverse all 500,000 particles all at once. They'll provide exactly the same functionality to manipulate the particles as above, but each will be completed with a single call that processes all particles, rather than an enormous number of calls that process each particle.

An even more savvy programmer might take advantage of SIMD calls on the system to process 4, 8, or more particles at a time rather than processing them individually in a tight loop. Then instead of just having four in the cache and paying for a single cache load, they'll also only pay for a single operation rather than four, giving even faster results.

All three of them have the same effect of calling 500,000 particles. Two of them just took better advantage of their hardware's capabilities.

Also note that this type of design only works in limited situations. There needs to be LOTS of things that can be accessed sequentially. Lots of particles being shifted. Lots of pixels in an image filter. If there are a small number of items, perhaps only a few hundred or a few thousand, the benefit is probably so small to not be worth it. Also if the items cannot be arranged sequentially the design change does not work.

I want to develop for mobile, where performance and resource usage appear to be quite important.

Your concern is admirable but misplaced. This is something you should not worry about in about 97% of the time. In the rare roughly 3% of the time it should be very obvious that you need to do something about it, you will not miss it by accident.

The rules for optimization are:

1. Don't worry about it.

2. Advanced: Don't worry about it yet.

3. (Experts Only): Don't worry about it until after you've carefully measured, then only worry about those specific pieces indicated by measurement.

I think I'm even more confused now than I was before. People, and even you, are saying that DOD is more efficient, but at the same time you're saying that it's not? It is but it isn't? :(

You're confused partly because this is a complex topic. If you have little knowledge about a complex topic and you ask general questions, confusion is usually the result.

If you give us more specific examples of what problems you're trying to solve, then we can be more specific about whether a DOD or OOD approach would work better for you.

Other than that I can only repeat what I said above. You should arrange you code in the way that makes the most sense to you. Most people typically think of scenegraph objects in an OOD way, and that's a simple and general way to break things up.

One way to help you grok OOP, Functional, Procedural is to think of them as spoken languages. say English, German and Japanese. Each is a method of communication but each has different words, sentence structures and intonations.

Within a language you have many dialects, which on the surface are similar but dig deep enough and there are many subtle differences, these are the various computer languages.

Traditionally the languages fall in the paradigms as follows:

OO (Objects) - C#, C++, Java

Procuedural (Data and functions) - C, Fortran

Functional (Everything is data, including functions) - SQL, ML based, Haskel, Lisp based.

You can DOD in all of these paradigms.

In recent years the lines between paradigms has blurred as many languages are moving to a multi paradigm state, Such as Scala, F# and even C#. Most being a mix between the OO and functional worlds but there roots and core use remain true t their origins.

To reiterate again, OOP is not at odds with DOD. Its entirely possible, likely even, that your DOD code might employ classes, inheritence, even virtual functions in some capacity, even if not in the exact same capacity that a DOD-ignorant OOP progam would. Separately, some parts of your code -- the most computationally-intensive parts, usually -- will benefit most from DOD (and often lend themselves fairly naturally), and other parts of your code will not benefit and perhaps fit a DOD-ignorant, OOP style more naturally. In programming, we don't choose one approach and apply it to the entire program. Its natural and common that some parts of a program will appear more like OOP, functional, or procedural -- we as programmers are left to choose the best approach taking into account the requirements, what language features are available to us, and how we intend to weld these parts together. DOD exists on a separate axis, and can be freely mixed into different parts of the program as needed, regardless of the programming paradigm we'll leverage. Choosing to approach some part of the program from a DOD mindset does have an impact on how you utilize those paradigms, but you tend to think of it as just another requirement that has to be balanced -- it doesn't come crashing through the wall demanding that you can no longer use such-and-such language feature, or that you have to use such-and-such other feature.

By way of example, take the typical OOP approach of having a particle class -- position, mass, velocity, color, lifetime, etc -- and having a collection of those particles -- basically as Frob described earlier. If you were mistaken about DOD and assumed it merely meant "looks like procedural" you could separate the data into C-style structs, and have free functions to operate on them, but that won't be DOD because it didn't rearrange the data, it just rearranged the source code.

A more DOD approach, would be to transmute the multitude of particle objects, represented by the Particle class, into a Particles class that owns all the particles -- now you have arrays (vectors, more likely, but contiguous and homogeneous in any case) of data -- postions[], masses[], velocities[]. colors[], lifetimes[], etc[] -- now, you've re-arranged the data, but you'll notice that this Particles thing still lends itself very well to being a class -- there's not a C-style struct in sight, and you're using std::vector, and you might inherit ExtraCoolParticles from Particles, and you might use a virtual function to dispatch the Update method (its true that DOD prefers to avoid virtual dispatch, particularly in tight loops, but its still sometimes the right tool at higher levels of control).

Moreover, you might notice that mass and velocity are almost always accessed near to one another, and the same for color and lifetime; it could be the case that a better arrangement of the data still would be positions[], masses_and_velocities[], colors_and_lifetimes[], etcs[]. Only profiling will tell you whether this is *really* the better arrangement, but its possible. One element of DOD is separating hot data from cold (that is, frequently-accessed from infrequently-accessed) which is essentially always a win because it leverages caches and pre-fetching better, and another element is to consider grouping desperate elements that are frequently accessed together which is sometimes a win, and sometimes not -- but neither of these say anything about what programming paradigm is employed; its a distinct consideration.

throw table_exception("(? ???)? ? ???");

This topic is closed to new replies.

Advertisement