Programming scientific GUI's, data and gui layout?

Started by
10 comments, last by Alberth 8 years, 8 months ago

I am looking for some advice on books or knowledge about programming GUI's specifically for scientific data analysis.

In terms of this stuff there seems to be general books on separate topics like

- Scientific programming (such as solving differential equations and plotting results).

- Design patterns, like model view controller for GUI applications.

None of these things I read seem to help me when dealing with a real world problem. Such as:

Lets say I have a class that represents some data (in this example just an array of numbers) I will load from a text file, and store it in an Object data.rawData; Now say I want to double the value of each number. What is the best way to do this?

Create a new object and store it in something like dataDoubled.data?

Put it in a new field of the original 'data' object so there is, data.rawData, and data.doubledData?

Now lets say I want to display the numbers of the original and doubled data in a GUI. Then it comes down to design patterns and stuff but everything I have found is too general.

So I guess I am after a book or resource who's focus is on GUI's built for scientific data processing, but also goes into the actual software architecture behind the scenes. But I guess any book that could teach how to write something like Word would be useful.

Thanks

Advertisement

Well, software patterns are somewhat generic by definition; otherwise they would be available as library. Besides that, architectural patterns like MVC, MVP, MVVM, and the more advanced ones are actually what to look for desktop application, including scientific ones. Those patterns are about the separation of business data, their representation, and their manipulation. I suggest you to look for comparisons, because such comparisons should hint especially at typical use cases. Nevertheless, don't forget that patterns are just guidelines; don't hesitate to diverge when appropriate.

Totally unrelated from the GUI architecture is the question about the business data management. You should avoid to store original and derived data into the same object. Treat it like variables in a programming language: You have a variable with the original data, you apply an operator, and yield in a result that is stored in another variable. This is fine because you don't know how which operator will be applied, how often an operation will be applied, or to which data they will be applied. So you need to provide most flexibility to that storage system. May be an operator is allowed to overwrite its source (see below); but the general case of writing to a new variable should ever be available, and it must be available if the format of the output is different anyway.

Regarding the operators themselves … it depends. Do you need a history of applied operations? Need an undo be supported? Do you need macros / operation recording? Should the operations be re-applied if input data changes? Do you need a type system to distinguish data types?


You should avoid to store original and derived data into the same object. Treat it like variables in a programming language: You have a variable with the original data, you apply an operator, and yield in a result that is stored in another variable.

Not questioning the soundness of this advice (because I do think it's sound), but would that really be feasible on large data sets? Postprocessing something like finite element result data with 1 million nodes is very standard and larger models with 10+ million nodes are common too. I can't imagine trying to have 2 copies of that data in memory. I would think the original data is stored to disk and only 1 copy is in memory and gets operated on. If need be, then it gets reloaded. But maybe I'm wrong though...it's happened once or twice tongue.png .

10 million isn't that much, 10 million floats is around 80MB, you can store that 10 times in just 1GB.
Also, it's not likely to have that many things to visualize, there isn't room for it at the screen, you need to reduce information.
Also, it's unlikely that you will have that data in one big flat array, more likely, it will be segmented or sparse or ... .


Anyway, for scientific software, in my experience, the guidelines are 1) correctness of the results, and 2) ease of change.

Correctness goes above anything else, results are worthless if they are not correct or cannot be trusted. Keep things simple is a good way to get correctness. Never modifying old data is one form of that (instead, build new data from old data, and let go of old data, the memory manager will eventually clean it up).

CPU time and memory isn't particularly interesting, as long as it all works 'fast enough'.

Ease of change is important because basically, the software is part of doing experiments, and you do each experiment one time (sort of). Once the idea is understood, research moves to the next question, which will mean the software has to change along with it.


Patterns aren't very useful for solving concrete problems imho, they are useful as common technical terms so you can discuss ideas of how to solve things. Most aren't that complicated to figure out by yourself. Edit: Instead just write code you need to solve the problem, and compare with patterns afterwards.

For Gui programming, scientific visualization is probably close enough to a normal Gui program that you should start with the latter. The main problem is understanding event-driven programming. Find a Gui toolkit for your programming language, and do the tutorial, make a button, print 'hello' when you push it, draw lines at the canvas, that sort of things. It should give you enough ideas of how to scale up to a more complete gui.
If you use C++ and don't know what Gui library to use, I'd recommend Qt. They have great tutorials and good documentation.

As for books on how to write a word processor, there is surprisingly little documentation about that. The biggest source is the PhD thesis of the guy building vi (iirc), and basically the best solution comes down to a large text buffer, with one gap in it, at the position where the cursor is. If you move the cursor, move the gap (obviously, you can be a bit smarter, and delay it a bit...). When you type a letter add it in the gap.
You need to separate your data from the GUI. They should not be mixed. The fact that you are displaying a floating point number or a graph of data does not mean that the GUI needs to understand the how the data is stored. You store the data in some manor, modify it and then use the GUI to display it. If you mix data storage with GUI design, then you've linked them in such a way that modifying one requires the modification of the other. This will lead to a buggy mess. You design the GUI around the type of data being displayed, but the GUI still only displays what it is given. It shouldn't care about the underlying data.

As far as the data storage issue goes, keep the original copy on disk. Load from disk and modify in memory. If you save the modified data to disk, either always save in a new file to preserve the original data, or make it a multi-step process (for safety) to overwrite the original.

Honestly, I think you are making this harder than it needs to be. I think you are too focused on the type of data being displayed. The fact that it is scientific data and not an image file or spreadsheet only dictates the fashion in which the data is displayed.

ParaView and the VTK library are a good example you could learn from. ParaView is a scientific visualisation package, which has a reasonable amount of data manipulation functionality. The functionality of ParaView is provided by the VTK library, where ParaView is essentially a QT wrapper for VTK. Both ParaView and VTK are open source, and there are a number of books published on them, just search Amazon.

You might be interested in just doing an IMGUI for your data displaying. If you have that much data you don't really want it stored in the GUI controls as well, that would be largely pointless and duplication of state is a bad thing generally. Search google for IMGUI, it might be of interest to you.


10 million isn't that much, 10 million floats is around 80MB, you can store that 10 times in just 1GB.

It seems you might not be familiar with finite element models. It's more than 10 million floats. I'm sure it's double precision, and each node has 6 degrees of freedom. Plus, each node can be tied to multiple elements, so there's element data there along with different types of stress and strain data. It's not uncommon for the output files to be 150+ GB, depending on what information is output.

You are right, I don't know. It's one of the areas I want to learn more about though :)

Thanks for explaining.

Hi Guys,

Thanks for all the input, some very sound advice here I am learning a lot. I think the biggest problem is as people have said, design patterns are more of a guideline, and there are no concrete 'correct' ways that can cover all the different combinations of data processing that may be undertaken.

Find a Gui toolkit for your programming language, and do the tutorial, make a button, print 'hello' when you push it, draw lines at the canvas, that sort of things. It should give you enough ideas of how to scale up to a more complete gui.
If you use C++ and don't know what Gui library to use, I'd recommend Qt. They have great tutorials and good documentation.

I am using MATLAB and quite comfortable with it and programming a GUI using OOP with it. I find reading about patterns like MVC is all well and good but without a proper example or implementation I really have no idea how to apply them to my own problems. I will give an example of the kind of things I am stuck with though. I attach a screenshot of my GUI as it stands now:

So the first thing the user does is press 'Add raw data', this opens a window where they can chose 1 or multiple data files. For the purposes of this example, each data file could be an .asc file, where each .asc file has a series of x and y data points. The file names are populated in the tab 'Raw Data', and the selected file name is plotted in the graph tab 'View 1'.

The user can select 1 or multiple files in the 'Raw Data' tab and click 'Double data values' to double the y values of the data points. The files that were doubled go into the tab 'Doubled Data', and when they are clicked on, will be displayed in 'View 1'.

The purpose of the View tabs could be different ways to show the data, such as line plot, scatter plot, bar chart...

My questions:

What exactly is the MVC setup for this?

Should 'Views' depend on the List boxes, or should the List boxes depends on the 'Views'. What I mean by this is when a data set is clicked on in 'Raw Data' should it check the View tabs with a switch statement to see which to display? And if I then click on a different 'View' should it query the List Box to see which file is highlighted, and then display that? But it would need to query both the 'Raw Data' and 'Doubled Data' List boxes, to first see which is 'Active' and then select the right file...? Basically there are so many combinations of things that I am struggling to see how best to have them communicate, and this is a tiny basic GUI...

Or am I linking the GUI elements with the data too tightly?

I have more people I need to quote and ask things but do not have time for now. I will get back to it. Thanks everyone for your input.

This topic is closed to new replies.

Advertisement