Re-Imagineering Unix

Started by
22 comments, last by Oluseyi 20 years, 9 months ago
Reapplying the Unix Philosophy I think that the 9 Tenets of the Unix Philosophy by Mike Gancarz and a brief exposition of the nine tenets, as well as the ten lesser tenets are solid principles for the design and implementation of software in general and operating systems in particular. Today we encounter more and more accusations of Unix not being "user-friendly" or being "difficult"; while Unix enthusiasts generally shrug of such comments with disdain, pointing to the simple elegance of the system and the power it provides, how can we fundamentally improve Unix? How can we maintain its power and expressiveness while making it easier to use? No, just throwing a GUI on top is not enough. GUI and CLI Inadequacy The big flaw of the graphical user interface is that it leads to the construction of applications with captive user interfaces and a lot of code redundancy. In addition, no one so far has been able to conceive an intuitive way to string GUI applications together to create powerful custom tools the way we can with CLI utilities. The big flaw of the command-line interface, OTOH, is that unless you already know what they do or where to look for documentation, you''re screwed. GUIs can take advantage of exploration and recognition to make themselves more usable while CLIs can''t. Is the solution to make GUI tools front ends to CLI utilities? To supposedly get the "best of both worlds" by making applications usable in multiple modalities? I don''t think so. Concept: Micro-Services CLI utilities fall into two categories in terms of their data input and output properties: generators and filters. A utility that "consumes" data - accepts input and generates no output - is a virus. Utilities that we are all familiar with include ls, cp, rm and find. I''ve been wondering whether it is not possible to convert the vast majority of system utilities, and even many third-party utes, into "micro-services" - headless applets of a System daemon that perform the core processing and return results to whatever application is requesting them. CLI and GUI utilities alike would use the same processing, meaning that less code - including glue - would need to be written for each of them (we''ll get to revamping CLIs and GUIs themselves in a sec). As an example, CLI ls would simply pass the filename pattern to System:ls and display the results while a GUI filebrowser would do the same, but then render icons or fill out detail tables based on the data returned (System:ls would return a pair of unordered lists of files and directories respectively by default, but could be instructed to return more information). Thus, the interface tool focuses on interfacing while the functionality is made a system-accessible service of sorts. Any other application that seeks to obtain a list of files simply makes a request to System:ls. Now how is this better than calling an API function? The invocation mechanism is the same whether you''re calling System:ls, System:cp or Oluseyi:Discombobulate - akin to a network send/recv session - and thus there''s no special libraries or binding involved. Furthermore, applications would be able to direct one microservice to pass its output to another microservice, thus emulating the commandline''s piping and redirection features whether in CLI or GUI apps (less of a big deal for CLIs, but fairly significant IMO for GUIs). Concept: I/O Abstraction The concept here is that almost every application needs to perform some kind of file I/O sooner or later; the more applications you have that can read and write the same file format (except for plain text), the more copies of functionally equivalent code you have on your disk. Data falls in relatively few categories (one of which is a hedge, basically for "unclassifiable" data): Text, Image, Audio, Video and Application - the basic MIME types. By structuring I/O hierarchically underneath these headings - thus obtaining text/plain, text/html, text/xml, audio/mp3, audio/ogg, video/divx, video/mpeg4, image/gif, image/png and so forth - it is conceptually possible to create a system where applications deal with the high-level type (eg Image) and system codecs deal with converting from a specific format (eg image/png) to the high-level type. The upsides of such a system are 1) less identical or functionally equivalent code on your system, which can be telling when totalled; and 2) the addition of a codec can potentially allow all applications that deal with the associated high-level type to read the specific format - adding a PNG codec means that Adobe PhotoShop, Internet Explorer, Microsoft Word, MS Paint and all other apps that render and/or write images (using Windows examples) can now read and write PNGs. Essentially, I/O becomes another system API or shell script that everyone can use in developing their app, and continue to gain from even after their app is shipped. Developers no longer need to focus on inanities like supporting obscure formats, and users can use their favorite editor with their favorite format even if the editor''s developer didn''t support the format by adding a codec to their system as a whole. Concept: GUI Assembly Taking a page from graphical flow process tools like National Instruments various instrumentation lab simulators and coupling with the micro-service concept (especially when our micro-services return structured information about themselves upon request), a GUI app could be written that allows for the rapid graphical construction of CLI-style piping and redirection-laden custom tools/utilities. It''s not as efficient as the real thing, but it allows users unfamiliar with the system to reap many of the same functionality benefits even if they don''t do it at the same speed as CLI wizards. These aren''t very concrete ideas, but I just wanted to share and see what others think or come up with.
Advertisement
GUI and CLI Inadequacy
I''ve often thought about how one could attempt to adapt the strengths of the unix command line into a GUI. I haven''t read other papers on this topic or really discussed or tried to implement my ideas, for what it''s worth.

My favorite idea for an implementation right now is an extremely drag-and-drop-able GUI. This is my favorite idea at the moment mostly because:

  • It could be done with existing toolkits, if an appropriate transport framework exists for the drag-and-dropping, and existing applications would benefit.

  • It''s more immediate and compact than "pipeline construction" style programs.



A special "sink" variety of file manager might have to be implemented or plugged onto an existing file manager to replicate some of the file manipulation type of stuff.

The text manipulation tools (awk, sed, et cetera) would probably be the hardest to replicate in a GUI; maybe a "pipeline construction" program would be applicable for this?

Of course, while this approach may be overall more intuitive that a "pipeline construction" approach, it may be more tedious unless a simple way to "replay" (macros, whatever) a series of tasks is devised.

Concept: I/O Abstraction
This is something that belongs in a user-land library, in my opinion (not to say that you wanted it in the kernel, or whatever). Libraries already exist to an extent that do this with audio and video (and combination) files (GStreamer, and to an extent libxine, among others). So, this comes down to the whole "getting everyone to use the same library" deal.

This is just an "off the top of my head" post, feel free to critique or whatever.

quote:Original post by Oluseyi
In addition, no one so far has been able to conceive an intuitive way to string GUI applications together to create powerful custom tools the way we can with CLI utilities.

It''s called Object Linking and Embedding, aka OLE. The very thing that allows you to put an Excel spreadsheet into a Word document that contains a Visio diagram.

I think that making GUI a frontend for CLI is a very bad idea. This is essentially what Linux people are doing (KDE, Gnome) and the results speak for themselves. I think there a much better solution that Windows Longhorn will probably expose. If Microsoft does this right, it will blow away Linux CLI. MS Office components are already scriptable (look up COM automation). Anyone with sufficient knowledge of VB (or any other COM enabled language) can do very powerful things with Office, a lot more powerful than Office GUI allows. Now, take this one step further and make the shell a fully blown scripting environment based on a modern scripting language (Python, anyone?). I suppose Microsoft will use C#, although I think a dynamically typed language would be a much better solution. Voila, you now have a very powerful GUI and an even more powerful CLI. Also, since the .NET framework is natively scriptable, you can take full advantage of a huge library in your scripts.
I''ve already thought about this extensively, and come up with essentially the same ideas.

What you describe is essentially Bonobo and GStreamer, components of Gnome. Don''t get me wrong, Gnome is nothing like what you describe, but the architecture is there. Gnome, as it currently stands, is essentially Windows by another name, but the value of the Gnome 2.x series is to provide an upgrade path.

Application developers and users currently cannot understand this kind of UI, so implementing applications such as AbiWord and Gnumeric as applications instead of components, makes sense at the moment, especially given the slow, methodical progress that Free and Open Source Software tends to make. When the components exist, then they can switch to the new model with a minimum of effort and loss of functionality.

It''s kind of creepy that you mention flowcharting applications as one method of constructing these applications. There happens to be such an application for constructing Gstreamer pipelines. I also considered this, but realized it was much too slow to begin to compare with command line applications. But this is not a fundamental problem. In fact, it can be possible, with a surplus of information, to make command line applications seem wasteful (in the time it takes to create meaningful applications), in addition to being user-unfriendly. Look to vi and Blender for examples of what I''m talking about. There are fundamental problems with both UI''s, but they''re actually the same problems that exist in command lines! They don''t show the available options and the current state. The only thing that needs to be grafted on to these UI''s is a system for graphically representing the state. (both UI''s are modal, and both UI''s use single- and double-keystroke commands to select from the available options in the current mode) I could explain this, but it''s somewhat offtopic at this point.

Back on topic, there are additional issues you haven''t considered: data persistence and network transparency.

Data Persistence:
You had the right idea, that all applications should be able to read files using services provided to each component. You didn''t consider the flipside of this. Let''s say you have a server status application that updates the cells in a spreadsheet component, and you decide you want to add a cell to the bottom that averages some of the other cells, and then send that average to a logging program.

How do you save that mess to disk? If you just saved the spreadsheet, you''d probably just get the numbers whenever you saved it, and the numbers would not continue to update. Similar things happen if you try to save the other components individually. So you need to save them all into the same file and preserve their connections.

I''m pretty sure that''s what the GDOM project is about.

Network transparency:
You''ve already allowed for the creation of a unique identifier for each service. Why not just extend that to the full URI spec? Then you could have components in your pipeline that were not on your computer. They could be rendered over a remote X connection, and you''d never notice the difference, if it weren''t for latency. Or these things would be provided to a local component via XML, which would be decoded and GUIfied. It doesn''t matter which, because there''s not difference between programs and pipelines.

This is almost free from the way everything else''s tied together (X and Corba both have network transparency. Bonobo probably does too). Some stuff that isn''t free might be provided by Mono.
---New infokeeps brain running;must gas up!
quote:Original post by CoffeeMug
It''s called Object Linking and Embedding, aka OLE. The very thing that allows you to put an Excel spreadsheet into a Word document that contains a Visio diagram.

Why doesn''t anyone on windows actually use OLE (besides Microsoft, of course) already? As you say, the advantages are obvious with Office.

Or do they and just never tell me about it?
---New infokeeps brain running;must gas up!
quote:Original post by Flarelocke
Original post by CoffeeMug
It''s called Object Linking and Embedding, aka OLE. The very thing that allows you to put an Excel spreadsheet into a Word document that contains a Visio diagram.

Why doesn''t anyone on windows actually use OLE (besides Microsoft, of course) already? As you say, the advantages are obvious with Office.

Or do they and just never tell me about it?


They do. They just don''t advertise it much. Microsoft don''t lard their adverts with technical jargon for a very good reason: Most people who hold the purse-strings for a business would be put off by terms like "COM+", "ActiveX" and "OLE". These people wouldn''t know the difference between the Windows Scripting Host and a game show host.

The people who DO know all this stuff tend to ignore marketing anyway and know where to find the info that cuts to the chase.

As for what else uses OLE: Internet Explorer is another example. In fact, anything that is an "ActiveX" object is scriptable to some extent. And Windows has tons of such objects. Need a rich text box? It''s there: just drop it into your app. Need to play an MPEG video in your app? Drop in the Media Player component.

Windows'' Scripting Host is capable of running scripts in any supported language, which is one of the main reasons for wanting a CLI in the first place. (The Host comes with support for VBScript and ECMAScript as standard, but support for other scripting languages, including PERL, is already out there.)

Microsoft may have their faults, but they''re not stupid. Many users are not IT savvy, so there''s no point blinding them with science. The MS publicity machine only ever talks about the tip of the Windows features iceberg.

As for KDE and GNOME, those teams still can''t get cut and paste working properly. I wouldn''t hold my breath waiting for them to deliver robust drag and drop.

--
Sean Timarco Baggaley
Sean Timarco Baggaley (Est. 1971.)Warning: May contain bollocks.
Cut and paste works just fine.

quote:They do. They just don''t advertise it much. Microsoft don''t lard their adverts with technical jargon for a very good reason: Most people who hold the purse-strings for a business would be put off by terms like "COM+", "ActiveX" and "OLE". These people wouldn''t know the difference between the Windows Scripting Host and a game show host.
Perhaps I wasn''t clear, but I meant, who makes OLE components (that''s what it''s for, isn''t it?) besides Microsoft? Kinda boring if everything that can be included in a program has to be included in the monopoly, too.
---New infokeeps brain running;must gas up!
Just out of curiosity, why do people think that a GUI as a frontend to a CLI program is a bad thing? I mean, if the wheel already exists, you don''t need to reinvent it. You just need to make it look nicer.

Granted, this only applies for existing software that works well, but I certainly agree with you if you''re talking about new programs, or improving on aging ones or something. But otherwise, I don''t see why it''s so bad.

The Artist Formerly Known as CmndrM

http://chaos.webhop.org
quote:Original post by Flarelocke
Perhaps I wasn''t clear, but I meant, who makes OLE components (that''s what it''s for, isn''t it?) besides Microsoft?

OLE is an old (about 1994) term for COM (Component Object Model). There''s A LOT of software for windows that takes advantage of COM components. COM components aren''t necessarily scriptable, but they can be if the developer chooses to go the extra mile. Most business/financial software is based on COM architecture because many third parties need to take advantage of its functionality programatically. The reason why many applications are not based on COM mainly has to do with the fact that COM components are rediculously hard to make in C++. However, I believe everything made in VB is based on COM. With Microsoft pushing .NET the scene is changing. Every class essentially becomes a component that natively supports scripting without COM''s pain of macros and dual interfaces. We can''t really take advantage of that yet in Unix''s CLI sense because the shell framework isn''t in place. There are rumors that Windows Longhorn will have a very powerful shell, most likely based on VB or C# (although I think it will support all .NET languages). This will probably be the time when we see a modern CLI, much more powerful and intuitive then the old unix one.
quote:Original post by Flarelocke
You''ve already allowed for the creation of a unique identifier for each service. Why not just extend that to the full URI spec?

This is already in place. DCOM (Distributed COM) allows you to do just that. This isn''t often used on home machines but in financial software you often don''t know whether the component you''re using resides on your local machine or somewhere else on the network.
quote:Original post by Flarelocke
How do you save that mess to disk?

.NET components natively support serialization. In three lines of code you can save very complex hierarchies to disk and load them next time you need them. Of course it also provides very powerful interfaces for customizations, in case default serialization doesn''t suit you.
quote:Original post by Strife
Just out of curiosity, why do people think that a GUI as a frontend to a CLI program is a bad thing?

Because CLI is designed to be just that: a command line interface. What works well for CLI doesn''t work well for GUIs and vice versa. Look at all the Linux GUI tools that use CLI programs on the backend. They suck, and they do so for a reason. Make a small experiment, take a simple CLI command and try to design a good GUI for it. Chances are your GUI won''t be very intuitive because it has to fit a design with a completely different philosophy.
quote:Original Post by 9 Tenets of the Unix Philosophy
The program is loaded into memory, accomplishes its function, and then gets out of the way to allow the next single-minded program to begin.

This is exactly why a GUI cannot be designed as a frontend for CLI. The philosophy is entirely different. Note, that the philosophy from the quote above might have been ok fourty years ago when computers were mainly used as time sharing machines for processing data. Today, when personal computers are used to browse the web, write documents, do video conferencing, etc. this philosophy is simply outdated.

I disagree with many other standpoints in that article because they''re designed for entirely different purposes. Computers today have different purposes than they did fourty years ago, software engineering has also changed.

This topic is closed to new replies.

Advertisement