Code organization without the limitations of files

Started by
24 comments, last by SmkViper 9 years, 7 months ago
I've been contemplating an interesting phenomenon recently, some of which has spilled out of my brain into my journal here on GDNet, but mostly it's been a lot of quiet brain-churning.

First a bit of background. Suppose we have a large project, like a decent sized game, and we need to explore the code. Consider the first time you dropped into a large, unfamiliar codebase, and what kinds of spelunking was necessary to learn how all the pieces fit together.

Usually, code is organized into conceptual units which I'll call modules. Modules consist of code units such as data structures, functions, type definitions, global variables/constants, and so on. These may or may not be able to be broken into namespaces, which are typically orthogonal to modules in terms of language semantics, although some languages treat modules and namespaces as identical concepts.

The real question is, how do we lay out all this stuff on disk? Traditionally we write code in exactly one place in exactly one file. This is great for some things, like diffs and version control, but sucks for other things - like discoverability.

Going back to the scenario from above, think about what happens when you get a project that contains a lot of code. Most of that code does not fall into exactly one category. You may have UI, graphics, physics, AI, audio, game-specific logic, and so on.


Now suppose I want to view all code related to casting a fireball spell in this particular game - head to toe, from the UI button that triggers the spell to the game logic that handles it to the graphics code that renders it. Throw in audio effects for good measure.

How do I see all this stuff in the traditional structure? I open thirty different files, each of which might have two or three code units I actually care about.

Now I need to add some logic about fireballs, but only fireballs, and it has to do with graphics and audio. I need to distribute my code changes across at least two files. And what if I have something that is related to both fireballs and lightning, but not much else? What file does it go into?


This all strikes me as extraordinarily wasteful (in terms of programmer time and energy). I'm curious if there is a better way to allow for organizing code, so that this kind of scenario goes away entirely. There are other benefits I can imagine to getting away from the 1:1 file religion, but they're mostly side effects.

The real question is: can we invent a way to think of and view code that elegantly solves this kind of problem?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement
I don't personally have a problem with the standard arrangement of code living in files, but let's consider other possibilities. Would you be more satisfied if instead of a directory structure (where each file must live in a single directory) we had tags, similarly to GMail messages? That way you could tag a piece of code as both "fireball" and "lightning", and you would see it whether you are browsing for "fireball" things or for "lightning" things.

If you are in a Unix-style file system, you can already achieve some of this by making directories for the different tags and using hard links to place the file in any number of directories.
I've pondered the tagging idea a bit, and I think it has potential - but it runs into a tricky issue: how do you go from code you can text-edit to data on disk? Ideally that process preserves the ability to do things like diffs and version control.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Now suppose I want to view all code related to casting a fireball spell in this particular game - head to toe, from the UI button that triggers the spell to the game logic that handles it to the graphics code that renders it. Throw in audio effects for good measure.

This just doesn't make sense in practice much of the time. First, many bits of code and assets are going to be shared common structures without attachment to one specific effect. For two, things like fireballs are data files and definitions, not code, so the organization imposed by source files is quite irrelevant. Third, content creators are especially bad at organizing files or using such tools which is why you so often seen projects with fireball.png + fireball2.png + fireball_final.png + fireball_final_final.png + fireball_test_final_shipit.png + fireball_test_final_shipit2.png and so on.

Some engines do completely roll content management into their game editors, though, and some of these then offer tagging facilities. These are separate from the on-disk file structure. e.g. data/textures/vfx/fireball.png might have a corresponding fireball.json file that includes a UUID, "friendly" name, description, list of tags, and other metadata used by the editor for content browsing and linking. This data isn't available in other programs but you can conceivably include an Explorer plugin or the like that makes consumption and modification of that metadata using stock OS UIs a bit easier.

This is all for content still. You really, really shouldn't ever need a source file for a fireball or so on. Your source should be a collection of resuable modules/components and higher-level game primitives can be composed out of those. A fireball would be a particle effect, a DamageOverTime component, a physics collision model, some sound files, materials for burn marks and a component that applies them, etc. No fireball-specific source code would ever be required. Content designers can then be free to create lightning balls or so on without ever bugging a highly-paid engineer or having to wait for new builds to be released to the team.

Sean Middleditch – Game Systems Engineer – Join my team!

Every function goes into it's own file. BAM super IDE has minority report of those billion fragment-files and diff still works.

Just because you don't want to view things as files doesn't mean you actually have to change the storage mechanism though. Quite often when exploring existing code, I'll use the "find all references to" feature of the IDE, which produces a list of places where this class/function/variable is touched. When I click on an item in that list, the IDE opens the file and scrolls to the line... Your super IDE could instead open all the file, but only displaying the range of lines that we're interested in. These file-views don't need to be full screen, so you can file a lot of them in a nice draggable UI to get all the info in front of the user at once.

<snip>



Uh... that was just an example, dude. 99% of what you said is based on an irrelevant throwaway detail in an illustrative sentence, and nothing you said has anything to do with the subject of the thread.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I'm using C# most of the time lately on large projects (about a thousand files, so maybe not THAT large of a project), and I don't really have file navigation problems.

If:
- I have no idea what the thing is called: I ask a teammate (or if nobody remembers, I open up the Object Browser).
- I've seen the code: I exclusively use the IDE's navigation functions (go-to-def, find-all-refs, navigate-by-symbol/file-substring, navigate forward/back). This completely removes the need for me to remember which folder a file is in.

As far as where I actually put new stuff:

- Files: Wherever makes sense. For Unity, I just have to make sure to put it in the correct combination of plugins-or-not, editor-or-not, and potentially in a platform-specific folder if necessary. This is more likely for repository management purposes (submodules, etc) than source code organization.

- Classes: Typically one per file. I like short files. Sometimes I use partial classes to split up extremely large classes across multiple files (I usually only feel like doing this for machine code disassemblers since they are essentially just massive nested switch statements).

What I've seen is that projects sometimes are split in multiple projects and eventually become a bigger mess that the original code base.

I've tried different structures, both for C# and C++, and I usually keep one file per class and map directory structure to namespaces. For C++, I keep headers and source files together (handy if you just want to browse some files with a text editor instead of working with the full IDE). If I have related structures/functions/etc that are relatively small, I keep them in the same code unit for brevity. Also, I've come to heavily rely on IDEs' code navigation features, and I can't image coding efficiently without them.

I've also noticed that no matter how I start a project, once it grows into something used by customers, it always degrades into exceptions to the rule. Different formatting here, different file/directory structure there, folders with lots of files in them and then folders with just a single file.

I wouldn't worry about it, this is what Ctrl+F or Ctrl+Shift+F are for. When I learn a new code base, I don't always rely on the IDE's navigation features, as some names may be used in strings, etc, so finding in all files works better. But it's always a pain.

Interesting line of thought Apoch, but is the file structure on the disc really the problem? What if we could afford to write our own IDE to handle all file-related stuff for us?

With a custom IDE, the IDE itself could decide how to map the code written into different files on the disc, ideally without intervention from the developer. Then, the IDE could display a custom "view" on the code in whichever way you prefer. To solve the issue of navigating around and tracing paths through code, maybe it would be possible to generate some sort of "code-flow-graph" that maps the connections between methods or code-fragments in the code base. This could be partially automated (like a recursive call to "Go to definition"), but there still has to be a lot of intervention for more obscure connections, starting with virtual methods (MethodX takes an ISomething and calls a virtual method on it, where does this go to?) and harder-to-trace things like event queues in multithreaded applications or network serialization (Cast fireball? Well we are playing an MMORPG and first have to ask the server if we can do this and then a frame later or something the control-path of "casting a fireball" resumes). This would require some sort of meta-information about the code, i guess similar to the tags that Álvaro mentioned. This is potentially error-prone, just like writing comments, because the meta-information has to be kept in sync with the actual code. But if you were to provide that information during creation of a code-segment, maybe this could be simplified a little. Just as regular code is grouped into modules, upon creation of a new code-element (could be a class or a global function or whatever) this particular code-element could be assigned one or more "domains", like "memory allocation", "unit movement" or "casting fireball". A domain would be similar to a namespace or package, but it maps more directly to "use-cases" in the application and is independent of the underlying grouping of code-fragments.

Then it would be possible to map a code-path for domain "casting fireball" starting from a given point. This could then also be displayed in a single document or in multiple connected documents or whatever is convenient.

The domains would also have the positive side-effect of making dependencies between code-pieces easier to track (though I'm sure this is somehow already possible with some tools).

Thinking about it, if you get a new code-base from someone who has worked with this IDE and followed its rules, you would open the project and could display a high-level "domain overview" showing all the different elements of the whole code-base.

The downside would be that version control has to be somehow integrated into the IDE and has to work with the meta-information too.

Could be fun to try out if that would even work or if it is a stupid idea biggrin.png All in all I personally would prefer an approach that lets me work with code without having to deal with how it is stored on the disk instead of creating a new storage scheme smile.png

Code perceive-ability is probably what you might want to call something like this. The ability to easily intuit how bits and pieces of code relate to and are connected to each other.

Tagging is certainly one way to do it, and one you mentioned in your journal post. I can definitely see the ability to tag functions, code blocks, classes, “things” in general, as being a very powerful tool in a file-less environment. However tagging has several problems inherit to it: Tag maintenance, tag scope, and documentation. The first one is really a problem we encounter with any kind of non-functionality affecting code or commenting already. Things like attributes get out of date, functionality gets moved but documentation is not changed, etc. Tag scope is also an issue, at what level should tags be applied, and how detailed are your tags. Taking the fireball example up, one can certainly look at it as one method might be tagged <Fireball> <Audio> <Sound Effect>, while another might be <Fireball> <AOE Burst> <Damage Over Time>. Which brings me to the third problem, documentation of tags: You will need some form of a guideline as to when a tag should be used, and when it should not. Tags that are highly specific (such as the ones above) are easy to understand, but more generic tags might be desired as well (such as “Spell”) and determining where to place that kind of tag will require some kind of a uniform agreement or standard.

Working out how to store this data on the disk depends on your goals. If you desire to interact with existing tools (such as git, p4, etc.) then you will need to use files. Probably the best way to break it up, in that case, is to do what we already do: Break it up by class, into nested directories per namespace. This is not the “best” way to do it, but it is a trivially automatable way to do it, and one that is frequently used on projects.

If, on the other hand, we’re looking at developing an entirely knew methodology for handling source code, then we can assume that we will be providing… “wrappers” over those third party tools (or even providing our own tools). In which case we can start to work on a much more database style storage of source code. Methods, types, snippets, algorithms, the various libraries and third party interfaces could be broken down and stored in a much more relational model… and that would allow you to perform tricks such as joins and relationships between tags.

One benefit to a non-source oriented system is that you could flag pieces of systems for notification on modification. Let us take an example: Assume you have an ability called Fiery Rush which is a skill available to Elementalists when they use the Fiery Great Sword elite skill, Fiery Rush charges an opponent leaving a series of fire fields behind which burn for damage over time. Then, at the end of the charge the character executes a final attack. Now, you’ve just begun a round of balance changes to bring this (and other abilities) into line with the damage targets you expect. Now, someone has supposedly modified it to do 140% more damage in the final attack, and 70% less damage per fire field. Currently you would have to go back and check the various source files involved in calculating those numbers, and also looking at the database fields associated with the skill, etc. to find out if those changes have actually happened. With a tagging or similar system though you could put into the source control a notification alert that would send you a message any time a function with a specific tag (like say <Fiery Rush>) was modified. This way you would know anytime someone has changed it and could go back and review those changes. This would be especially important if your title was something like Lead Programmer Of Not Breaking Fiery Rush By Failing To Bump The Final Attack Damage By 2.4x.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

This topic is closed to new replies.

Advertisement