OOP and DOD

Started by
24 comments, last by Norman Barrows 7 years, 11 months ago

Recently, I have been studying various design patterns. While they all seem very interesting and useful in different situations, I think I've gotten a little confused about structuring my code. Code is never clean enough. So I spend much time refactoring and thinking of better ways to simplify code to make it easier to read later.

I would like to hear about your opinions of Object-Oriented Programming and Data-Oriented Design. I'm not really asking if I should not be using one or the other, but rather when and how I should efficiently use OOP and DOD?

OOP as I mean it here refers to classes with an interface, attributes, and methods.

DOD as I mean it here refers to structures with external functions that use them.

Advertisement

DOD as I mean it here refers to structures with external functions that use them.


That isn't data-oriented design, that's procedural programming. That's not even necessarily "not object-oriented programming" - you can write "object-oriented" code in that manner.
Data-oriented design is structuring your code around your data and how your data is going to be processed, rather than focusing on code as an end in and of itself.

You may find this reading useful. And this. And this.

I think I've gotten a little confused about structuring my code. Code is never clean enough. So I spend much time refactoring and thinking of better ways to simplify code to make it easier to read later.

I fell into this trap when I first started writing code. Things would look messy, sloppy, or seem like they could be refactored, redone, renamed, restructured, just to make the code better. Do you know what I learned from all this? You can spend way too much time on this, when the code is never used again or read by anyone. There are so many projects that I spent a lot of time on designing, testing, refactoring, and nurturing only to find that the project was canceled, requirements changed so the code wasn't needed, or I designed a generic, all-purpose solution and ended up only needing 10% of the features.

Many times I made a one-off program to do a little thing, and didn't worry about the structure and all, and those turned into multi-year projects. That messy code at the beginning never really hurt me at all. I just refactored or rewrote the ugly stuff once I knew what I really needed.

Bottom line, don't worry about the "right" way. Right this very minute people are arguing that OOP is terrible and the reason all software is crap. We don't know the "Right" way because its different for every project.

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

OOP and DOD are two very useful paradigms among hundreds, possibly thousands, that developers can pull out as appropriate.

Although religious wars have been fought over nuance, object oriented effectively means that you have objects and the objects have a series of functions/methods to perform a cluster of tasks.

Similarly subject to religious wars, data oriented effectively means that you are organizing data in a way that is friendly to the cache or other underlying hardware.

As the programmer you can use one, the other, or both, as you see fit in your code. You can also use flow based paradigms, event driven paradigms, parallel paradigms, and more, in whatever ways you see fit in your code.

I would like to hear about your opinions of Object-Oriented Programming and Data-Oriented Design. I'm not really asking if I should not be using one or the other, but rather when and how I should efficiently use OOP and DOD?

Ok I'm trying to think of a way to answer this without getting into several paragraphs, starting a religious war, or pointlessly confusing you.

The thing to understand is that OOD/DOD are about organizing data within your game/app and how that data is transformed from one state to another. At the end of the day both will get you where you're going, to get your game level from one frame to the next one. I'd say that if you're trying to figure out when to use OOD versus DOD, I'd consider these 3 main things:

1) Is your problem mainly about optimization, about running a simulation as fast as possible? If so, DOD might be the best way for you to go.

2) Do you have smaller numbers of objects that interact in complex ways with each other during an update? OOP is probably better for you. But,

Do you have large numbers of objects that dont interact with each other? DOD is probably better and faster.

and

3) Is multi-threading a consideration? In this case DOD might be better for you.

As others have pointed out, what you're calling DOD is more akin to procedural-style programming, as typical of C code. You can do OOP in C even, you just don't have convenient tools built into the language for doing so. Likewise, you can do actual DOD using OOP techniques or procedural techniques, or functional or other techniques as well.

When we talk about Object-Oriented, Procedural, Functional, Declarative (and more) styles of programming, we typically call those programming paradigms -- a language that is designed to fit one (or maybe blend a few) of those paradigms typically has language-level features and makes language and library design decisions that support and encourage programmers in leveraging a certain mindset when expressing their solutions at the level of source code.

As of yet, I'm not aware of any language that adopts Data-Oriented Design in the way that, say C++, adopts Object-Oriented Design, and I (and most people, I would assume) tend to think of DOD existing on a separate plane that's mostly orthogonal to the plane where OOP, Procedural, and other programming paradigms exist. This is because actual DOD isn't really about how a programmer maps their solutions to source code, its really about how their solution maps to realities of hardware with an emphasis on what data belong physically-together and how it flows through the program logic. DOD says that this mapping from solution to real hardware is more important than the mapping from a programmer's solution to source code -- thus, in DOD, hardware realities drive the solution, and the solution drives the source code. This is the reverse of the typical approach, where programmers do not typically deeply consider the realities of hardware (indeed, some schools of programming actively discourage such considerations) or, if they do, attempts to retrofit hardware considerations as optimizations after the program structure, according to whichever programming paradigm, is already crystallized and difficult to fundamentally change. DOD has to be considered from the start since it dictates how your data will be organized and how it will flow, at least for the processing-intesive parts of your program that will benefit from it; DOD can't be an afterthought.

On OOP, one of the troubles is that what's taught as "OOP" in books and in college classrooms tends to be a very shallow and dogmatic view of it. Most colleges today teach OOP using Java which as a language is particularly dogmatic (there are many reasonable choices which the language simply disallows a programmer to make because the language designers deemed their one-true-way as automatically superior), and not to mention needlessly verbose because of it. Thus, Java is all of OOP many people know when they exit college, and they go on to program in C# or C++ or other "OOP" languages as if they were Java.

Java has no free-standing functions, Java has no operator overloading, Java is garbage-collected, Java is intrinsically independent of any real hardware by creating a fictional homogeneous virtual hardware platform.

Java made choices largely opposite of C++ even though they look superficially similar. C++ has free-standing functions, supports operator overloading, is not garbage collected (or even reference counted by default), and does not make itself independent of real hardware, but defines where those differences may appear explicitly (simultaneously discouraging, but allowing, reliance on such platform-specific behavior). These are just a few examples, and both languages have their place, but it should come as no surprise that programming either one as if its the other, where its even possible, does a disservice to the program. Its like trying to speak Spanish by mixing Spanish words with English rules for grammar -- you might be able to communicate your ideas in the end, but you sound like an idiot and everyone wonders why you seem so overconfident of your ability to speak Spanish.

In C++, for example, its good practice for a class to be as small as possible, containing only the member variables necessary and only the member functions that must be able to manipulate those member variables directly. What's more, in C++, free-standing functions inside the same namespace as a class, if they operate on that class, are every bit as much a part of that class's interface as member functions are because of how C++ name-lookup and overload resolution work (see: Koenig Lookup). In Java-style OOP, this cannot be because the language says that every function must be part of a class -- and as a result every function can manipulate member variables directly even if it doesn't need to (Java's approach is worse for encapsulation, and makes testing more difficult in the same kind of way that global state does). This one difference makes good, idiomatic program design fundamentally different between the two languages -- all of this is kind of a long way of saying that even within "OOP", there are different, competing flavors that dominate in one language or the other. Finally, while I have no love for Java, I do not mean to leave you with the impression that C++-style OOP is the best style of OOP -- C++ happens to be a particularly popular and mostly-good blending of OOP with control over low-level hardware concerns which, combined with is mostly-C-compatible roots, has made it very attractive for game development and other computationally-intensive domains where efficient hardware utilization pays dividends -- C++ is not even a "pure" form of OOP, and many computer scientists argue that languages like Simula (the first OOP language) and smalltalk (another very early OOP language influenced by Simula) have never been surpassed as examples of the OOP programming paradigm.

In the end, the best programs tend to balance pragmatism with just-enough looking-forward. Programs that see the light of day tend to do only what they need, without caring overmuch about how pretty or fast or ideologically-pure they are. At the same time they avoid painting themselves into a corner -- too much specialization too soon, in the wrong places, or without good reason often ends up as wasted effort when it proves inflexible in the face of necessary changes later on. There isn't a formula for this balance, its something you gain a feel for through experience and to a lesser extent by learning from others who are experienced. Its the art of knowing when "better" has become "good enough", and accepting that after this point, "better still" is rarely a justification unto itself. Its accepting and even embracing that we will never know more about a problem now than we will know in the future, and not making big bets on unknowns, for or against (as a side-note, this is not at odds with DOD, since hardware details are known and immutable).

throw table_exception("(? ???)? ? ???");

In my opinion, and I'm assuming we're talking about high performance software development and C++ (since you've tagged the thread with this language), use DOD whenever possible and OOP when forced to because (even though I'm not sure if DOD has been formally and completely defined) what comes to mind technically when thinking of it is that it help us to tackle a couple of problems with OOP:

1. Inheritance abuse (including CPU costs of virtual function calls although generally that is an optimization).

2. Cache wastage through composition abuse and inheritance.

3. Destructors, constructors, member functions, member operator overloading, etc. leading more functional code writing instead of OOP.

Technically, as been stated before, the main result that you get from this is more POD and less objects, sometimes automagically achieving a better memory usage. Ultimately, you want to balance these things so that your only reason to use the (few) advantages of OOP is convenience.

In my opinion, and I'm assuming we're talking about high performance software development and C++ (since you've tagged the thread with this language), use DOD whenever possible ...

Let me expand a bit on this -- DOD is really the art of wringing utmost performance from a set of hardware that has specific, real-world characteristics -- machines have a word-size, a cache-line size, multiple levels of cache all with different characteristics and sizes, it has main memory, disk drives, and network interfaces all of which have specific bandwidths and latencies measurable in real, wall-clock time. Furthermore it has an MMU, and DMA engines, and it has peripheral devices that require or prefer that memory objects used to communicate with it appear in a certain format (e.g. compressed textures, encoded audio). Because of the already large -- and still growing -- disparity between memory access speed and CPU instruction throughput, it has been a lesser-known truth for some time that memory-access patterns, not CPU throughput or algorithmic complexity, is the first-order consideration for writing performant programs. No fast CPU or clever algorithm can make up for poor memory access patterns on today's machines (this was not the case earlier in computing history when the disparity between memory access speeds and CPU throughput was not so mismatched; I would estimate it has been the case since around the time of the original Pentium CPU, but hadn't become visible to more mainstream programmers until probably 10 years ago, or less).

If performance is critical, DOD is the only reasonable starting point today. Period. End of Story.

But one must have a reasonable grasp of where performance is critical -- it would be unwise to program every part of your program at every level as if DOD is necessary or desirable in the same way that writing the entirety of your program in Assembly language would be -- in theory, you might end up with the most efficient program possible, but in practice you'll have put an order of magnitude more effort into a lot of code that never needed that level of attention to do an adequate job, and you'll have obfuscated solutions to problems where other methods lend themselves naturally. For instance, UI components would gain nothing by adopting DOD, yet a DOD solution would likely give up OOP approaches that fit the problem so naturally that UI widgets are one of the canonical example-fodder used when teaching OOP.

... and OOP when forced to because (even though I'm not sure if DOD has been formally and completely defined) what comes to mind technically when thinking of it is that it help us to tackle a couple of problems with OOP:

1. Inheritance abuse (including CPU costs of virtual function calls although generally that is an optimization).

2. Cache wastage through composition abuse and inheritance.

3. Destructors, constructors, member functions, member operator overloading, etc. leading more functional code writing instead of OOP.

Technically, as been stated before, the main result that you get from this is more POD and less objects, sometimes automagically achieving a better memory usage. Ultimately, you want to balance these things so that your only reason to use the (few) advantages of OOP is convenience.

Yet, its important to maintain awareness that OOP and DOD are not necessarily at odds. You can't, for example, answer the question "what's DOD?" with "Not OOP." Whatever programming paradigm(s) you choose to adopt, its prudent to select and leverage what features it can offer in service of DOD, for the parts of your program that adopt DOD. It might not be possible to write a DOD solution that looks exactly like a typical OOP solution, but its very possible to write a DOD solution that looks *more like* a typical OOP solution than like a typical Procedural solution. Again, DOD is (and must be) prime where you have deemed performance to be critical, but there are no language features or programming paradigms that it forbids; like all things in engineering, there must always be a considered balance of competing needs.

throw table_exception("(? ???)? ? ???");

But one must have a reasonable grasp of where performance is critical -- it would be unwise to program every part of your program at every level as if DOD is necessary or desirable in the same way that writing the entirety of your program in Assembly language would be -- in theory, you might end up with the most efficient program possible, but in practice you'll have put an order of magnitude more effort into a lot of code that never needed that level of attention to do an adequate job, and you'll have obfuscated solutions to problems where other methods lend themselves naturally. For instance, UI components would gain nothing by adopting DOD, yet a DOD solution would likely give up OOP approaches that fit the problem so naturally that UI widgets are one of the canonical example-fodder used when teaching OOP.

QFT. One of the reasons that DOD is still relatively unknown outside of certain areas (game programming, high-performance computing) is because it solves a very specific problem: memory access bound performance.

But for the vast majority (IMHO) of software written today, CPU/memory bound performance is an order of magnitude less important than the much lower bandwidth issues (data access, network access, etc).

Most "typical" business software spends its time waiting for database queries or REST APIs to complete. If DOD improves your algorithm even by 1000%, that's not going to help much if the total time spent waiting for process x to complete is 90% dependent on a high latency process.

I'm not arguing against DOD; it's very good for it's intended purpose. I'm simply attempting to explain why it's not more widely known.

if you think programming is like sex, you probably haven't done much of either.-------------- - capn_midnight

Thanks so much everyone. From what I gathered it seems that DOD, or functional programming, is faster than OOP but OOP is better for more complex behaviors. I often hear that using inheritance, abstract classes, polymorphism and such is slower due to v-table searches, but does that cause significant reduction of performance (to the point where say a player would notice)?

I like the idea of using abstract data and objects in C++ because it seems like a good way to organize code. On the other hand, I don't want to make a code base that is terribly inefficient.

This topic is closed to new replies.

Advertisement