Code organization without the limitations of files

Started by
24 comments, last by SmkViper 9 years, 7 months ago

I think this is one of those cases where an implementation detail forces you to use a tool in a specific way. ie. code stored in a hierarchical filesystem forces you to structure your program in such way.

The problem with hierarchies is that they are strict and rigid and force you to put your code in a dichotomy that is very ill suited for a multi dimensional problem like computer programs.

What you want is to abstract away the file system from the programmer and treat it as an optimization problem for the IDE to feed into the compiler.

Ultimately the source code would be stored in database that is closely modeled after the language specification, as a normalized, canonical representation of the program.

All the features that you wrote about would then be views on top of that database. That way you can choose a representation that fits the task you are currently trying to solve.

Big problem I see with this, like others have said, is the tight coupling to the IDE.

But it could be interesting nevertheless, I actually thought about this exact thing before.

Advertisement
Analysis of call graphs ranges from easy to ugly:

- Fixed calls (easy)
- Virtual calls (easy?): Do you show just the statically-known class or include all derived classes?
- Callbacks (easy? hard?): What's the ideal representation? Sequence diagrams? Something else?
- Functors/delegates/lambdas/etc: No idea how you'd analyze or represent these nicely.

How do you render the call graph? I wouldn't think that anything less than a Graphviz-style layout would be sufficient, but I've done call graphs with Graphviz, and real-world graphs tend to become completely unreadable very quickly. The density of calls is usually just too high to visualize cleanly.

What might be really badass is if you could trace data flow in the graph. For example, let's say we have a call from method A to method B, where one of the arguments is passed straight through without change. It might be nice to indicate that in the graph layout:


| Namespace | ========> | Namespace2 |
| Class     |           | Class      |
| Method    |           | Method     |
| Arg1      |     /---> | Arg1       |
| Arg2      | ---/
Or let's say you can select two variables in the graph and press a button which makes the IDE show you all of the statements that are involved in the data flow between those two variables (or tell you if no data flows between them). This would likely only be precise for stack variables, but it seems like you might be able to do some conservative analysis for object fields as well...


I like the tag-and-view idea. I was thinking that in C#, you could easily adapt the #region/#endregion directives to function as tagging constructs:


#region tags: spells, attacks
// ...whatever code you want to put here...
#endregion
You might want to automatically assign a GUID to each region as well, so that you could cherry-pick regions instead of being forced to get every region with a specific set of tags.


#region guid: {1BD2771D-D38F-4AB1-8076-8A10D5EBDE51} tags: spells, attacks
// ...whatever code you want to put here...
#endregion
Your view files would then just list guid and tag expressions:


guid: {1BD2771D-D38F-4AB1-8076-8A10D5EBDE51}
tags: spells && attacks
(In my example, that region matches the guid AND the tags, but this would obviously not be necessary)

Analysis of call graphs ranges from easy to ugly:

- Virtual calls (easy?): Do you show just the statically-known class or include all derived classes?

Let them choose. If it's an interface, then you indicate as such (icons), if I know X is going to be a Q then I'll want to be able to select Q.F() instead of B.F(). I should be able to easily change my choice later though, in case I was wrong or need to trace the flow through two or more different sets of derived methods. If we're doing this in a "view" based format, you could have separate chunks of the view for each virtual method I'm tracking.

- Callbacks (easy? hard?): What's the ideal representation? Sequence diagrams? Something else?

Probably sequence graphs if you're familiar with them. Alternatively: Callstack.

- Functors/delegates/lambdas/etc: No idea how you'd analyze or represent these nicely.

Same idea as Callbacks.

How do you render the call graph? I wouldn't think that anything less than a Graphviz-style layout would be sufficient, but I've done call graphs with Graphviz, and real-world graphs tend to become completely unreadable very quickly. The density of calls is usually just too high to visualize cleanly.

The real problem I see with this is that you can never eliminate the shit you don't care about, and at the same time bookmark the stuff you DO care about. Ideally I would want to be able to, in a method, right click on a function and say "add to view." At this point I can now see the method in my view alongside all the other methods. If this method is called by (either directly or through a potential derived instance) then I would like to see that relationship, either due to the implicit ordering of the method with relation to its callee, or by having the ability to show and hide a relation graph that might be a side part of the editor window.

What might be really badass is if you could trace data flow in the graph. For example, let's say we have a call from method A to method B, where one of the arguments is passed straight through without change. It might be nice to indicate that in the graph layout:

| Namespace | ========> | Namespace2 |
| Class     |           | Class      |
| Method    |           | Method     |
| Arg1      |     /---> | Arg1       |
| Arg2      | ---/

This is doable currently, just needs a bit of static analysis. You can do this in the latest C# CTP for 6.0 and roslyn.

I like the tag-and-view idea. I was thinking that in C#, you could easily adapt the #region/#endregion directives to function as tagging constructs:

#region tags: spells, attacks
// ...whatever code you want to put here...
#endregion
You might want to automatically assign a GUID to each region as well, so that you could cherry-pick regions instead of being forced to get every region with a specific set of tags.

#region guid: {1BD2771D-D38F-4AB1-8076-8A10D5EBDE51} tags: spells, attacks
// ...whatever code you want to put here...
#endregion
Your view files would then just list guid and tag expressions:

guid: {1BD2771D-D38F-4AB1-8076-8A10D5EBDE51}
tags: spells && attacks
(In my example, that region matches the guid AND the tags, but this would obviously not be necessary)

Hmm, you could use regions for this yes. This would result in the data not being available post-compile time though. Using attributes is another option...

[Tag("fireball","spells","attacks")]
[Tag("fireball, spells, attacks")]
[Tag("fireball"), Tag("spells"), Tag("attacks")]
This would result in the data being available after compilation for use by external automation tools...

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Analysis of call graphs ranges from easy to ugly:

- Virtual calls (easy?): Do you show just the statically-known class or include all derived classes?

Let them choose. If it's an interface, then you indicate as such (icons), if I know X is going to be a Q then I'll want to be able to select Q.F() instead of B.F(). I should be able to easily change my choice later though, in case I was wrong or need to trace the flow through two or more different sets of derived methods. If we're doing this in a "view" based format, you could have separate chunks of the view for each virtual method I'm tracking.


That's a good point. I typically think of call graphs as something the IDE generates which I can't tweak. If we let the user tweak the view to show what they're interested in, then it gets much better!

(side note: I found something I had been looking for which sounds a lot like this, but is currently debugging-oriented: http://visualstudiogallery.msdn.microsoft.com/4a979842-b9aa-4adf-bfef-83bd428a0acb )

Hmm, you could use regions for this yes. This would result in the data not being available post-compile time though. Using attributes is another option...





[Tag("fireball","spells","attacks")]
[Tag("fireball, spells, attacks")]
[Tag("fireball"), Tag("spells"), Tag("attacks")]
This would result in the data being available after compilation for use by external automation tools...


That's another good point - I was only thinking of this from a source perspective. .Net definitely emphasizes including metadata in DLLs so that consumers don't NEED the source code to use it conveniently. Attributes might be a good way to do this (I don't know of any alternatives).

If you had to add those attributes to everything by hand though, it would be seriously tedious. Ideally you could somehow multi-select whatever you want to add tags to and have the IDE add the attributes for you.

Let them choose. If it's an interface, then you indicate as such (icons), if I know X is going to be a Q then I'll want to be able to select Q.F() instead of B.F(). I should be able to easily change my choice later though, in case I was wrong or need to trace the flow through two or more different sets of derived methods. If we're doing this in a "view" based format, you could have separate chunks of the view for each virtual method I'm tracking.


That's a good point. I typically think of call graphs as something the IDE generates which I can't tweak. If we let the user tweak the view to show what they're interested in, then it gets much better!

(side note: I found something I had been looking for which sounds a lot like this, but is currently debugging-oriented: http://visualstudiogallery.msdn.microsoft.com/4a979842-b9aa-4adf-bfef-83bd428a0acb )

Yes, that is the essence of what I was thinking of writing up (at the moment) in VSIX. Except mine would be more views on methods in managed source files.

That's another good point - I was only thinking of this from a source perspective. .Net definitely emphasizes including metadata in DLLs so that consumers don't NEED the source code to use it conveniently. Attributes might be a good way to do this (I don't know of any alternatives).

If you had to add those attributes to everything by hand though, it would be seriously tedious. Ideally you could somehow multi-select whatever you want to add tags to and have the IDE add the attributes for you.

At first, yes, but then it becomes just like doc comments. In addition, with a bit of UI magic you could trivially have autocomplete capabilities on your tags.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

As for the source control integration and version history: Why not? You can run blame on a single line of code, so no reason you couldn't run that blame against the file for the set of lines that makes up the function and simply cut out all lines that are not that function.


I think the main problem here is that, while you might be able to get something that works in the common case (I change a few lines inside a function, change parameters, etc) it's going to fall apart horribly for any significant refactoring.

Moving a function around in a file isn't too hard to work with if you know what your source language is - even though it'll confuse the hell out of a traditional text diff program. But what happens when you change the function name? Change the name and rearrange them in the file? Split it up into 10 smaller functions? Move it into a larger one?

Not that we shouldn't look into improving our tools (making them at least aware of the language they're diffing would be a huge start) but I think it's going to be a while to get really good diff history if we stick with the file-based-storage method. And we certainly don't want to go to the other extreme and have a line-based-storage method.

Though now I have an idea for a scripting language that re-uses code by storing unique lines in a giant array and just referencing each one in the proper sequence. That can only end horribly.

This topic is closed to new replies.

Advertisement