Jump to content
  • Advertisement
Sign in to follow this  
Telastyn

Unity How would you approach this?

This topic is 2804 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.


Recommended Posts

I have a poll for the community at large. I have a project at work where there is a debate going on between design for an application:

  • The application is from scratch design.
  • It is a remote display appliance that will handle 1 or more physical displays.
  • It is controlled by a central system.
  • It will be a Windows 7 box (possibly windows 7 embedded).
  • It must run as a non-privileged user.
  • It may only pull from the central system. No pushes allowed.
  • It will be on and active ~20 hours out of every day.
  • It must be remotely upgradable by the central system.
  • Your network connection to the central system is... not guaranteed to be reliable, but guaranteed to be at least a run of the mill US cable modem.
  • It needs to download and play content (a variety of video/images/sounds) supplied by the central system.
  • The central system will specify some rules (around 25, 50-character strings) every so often (~15-90 mins) which the appliance will need to convert into a literal list of media items. Assume for this argument that the algorithm is done, requires no extra memory beyond the rules, ~5% cpu, and the list is infinite.
  • Assume for the sake of this discussion that the rendering/playing of content is done and self contained.
  • The appliance must report status (cpu/memory/disk, as well as general software health) back to the central system frequently (<5 mins).
  • The appliance must report exceptional conditions back to the central system.
  • The appliance must report back what items it played at the end of each time segment. This communication cannot be lost.
  • It is important that the displays play correct content, but it is **VITAL that the screens display some content**. If the appliance goes dark, you lose your job. If it shows random errors to the screen, you lose your job.
  • This is a new system. You must use C#, but 4.0 is available, as well as any released tools. You can control (within reason) what the appliance and even the central system has if I've not specified the requirements here. You have a excellent team of about 20 developers, about 4 of which are dedicated to this appliance and 1 year.So, the question surrounds the non-rendering piece of this appliance box. The debate surrounds how (in general) it should be designed. Should it be a large-ish single process with perhaps a few worker threads? Should it be a series of small processes? Should it be a single process composed of a number of threads?

    In general, how would you approach this problem? What things would you want to use? And *WHY*?


    I can answer questions about the requirements, but will do my best to conceal my preference to prevent bias.

Share this post


Link to post
Share on other sites
Advertisement
I don't understand the concurrency requirements here. Why should it be multithreaded/multiprocess in the first place? What of your specs demands concurrent implementation?

Share this post


Link to post
Share on other sites
From the uptime and remote upgrade requirements I'd go multiple processes. While I don't think it'd be impossible to do with a single process, it'd be severely annoying. Though I have to say the requirement for not losing communication combined with no guarantee of network reliability is pretty much a setup for failure.

Share this post


Link to post
Share on other sites
It is important that the displays play correct content, but it is **VITAL that the screens display some content**. If the appliance goes dark, you lose your job. If it shows random errors to the screen, you lose your job.[/quote]
What will the failover hardware look like?

For the cases when power supplies burn fail, capacitors on motherboard fail, cables get disconnected, etc...

Have an uptime requirement, and budgets increases tenfold. Unless the guarantee is about two nines, or a day of downtime per year.

The only alternative is to use linux-backed TV with custom ROM that switches to pre-defined image in case of failure.

The appliance must report back what items it played at the end of each time segment. This communication cannot be lost.[/quote]Which means multiple communication paths, 3G, GSM, satellite, wire, AM/FM, depending on desired probability of failure.

Add to that the usual failure modes (fire, flood, earthquake, car/airplane/vehicle crash, nuclear/EMP strike) and you're looking at higher security than what nuclear facilities have.

Whenever you have such requirements, always design for probability - everything can and *will* fail.

Until you set realistic goals, the budget for this will need to be in tens of millions per device. And systems that can cope with that do exist, some are even available as open source, but they don't come with hardware and such.


In practice, the above means something like this:
- Local-static/failure placeholder image or video
- Real-time/dealine feedback from device. If no feedback is received in one hour, assume device has failed and send repair team on site.
- Device maintains local log, mirrored across two or more media devices (hard-drive/flash drive or similar). Make sure to include life-span/MBR of that media into scheduled maintenance for replacement.
- Hardware watchdog to detect hangs and try to recover to avoid waiting for deadline expiry

This is as good as it gets.

What is missing from the above, especially in commodity hardware is self-diagnostic. What happens if physical screen goes blank? How do you detect it, how is this reported back? What about local disk health?

It may only pull from the central system. No pushes allowed.[/quote]Then how will it report status back?

It will be on and active ~20 hours out of every day.[/quote]Which means UPS and hot-swap standby replacement hardware.

Your network connection to the central system is... not guaranteed to be reliable, but guaranteed to be at least a run of the mill US cable modem.[/quote]Which means content will be cached locally for the maximum planned outage (see above - 1 hour deadline). Requires reliable local storage that can support this - but given simple modem, size isn't likely to be a problem.

It must be remotely upgradable by the central system.[/quote]At which point you'll be at the mercy of Windows Update installer. The system will need a test harness (identical local replica) for testing of Windows patches, as well as adequate on-device memory for this operation. Service packs and other patches required by codecs or other components (.Net or similar) can easily take several gigabytes to install. There is also no good remote recovery in case of failed update unless you mess with partitions and custom bootloader.

Should it be a large-ish single process with perhaps a few worker threads? Should it be a series of small processes? Should it be a single process composed of a number of threads?[/quote]It doesn't really matter regarding any of the above issues. To attempt fast recovery, the application process will be started either as shell replacement or from a script that loops back to itself and reports shutdown after exit. This helps bring application back up after sporadic crash.

Application itself can also contain a watchdog to force its termination in case something goes wrong.

In general, how would you approach this problem?[/quote]By clarifying and detailing the reliability requirements. "Never" does not exist or at very least, costs infinite money.

But almost all of this is a hardware problem.

Share this post


Link to post
Share on other sites

I don't understand the concurrency requirements here. Why should it be multithreaded/multiprocess in the first place? What of your specs demands concurrent implementation?


There are no specs which demand concurrent implementation directly. The general argument against tends to be the difficultly in scheduling all the different things it needs to do, especially since some of them might be long running.


regarding Antheus' post regarding proper uptime requirements
[/quote]

I don't know the actual SLA requirements off hand. For the sake of this argument, two nines seems acceptable. This isn't a medical device or military strategic display. If the device fails, a dozen people go idle, the customer gets egg on their face, and they/us lose a few hundred bucks an hour it's down. As you point out, if the customer needs more the cost will likely be passed along and is almost entirely hardware related.

"It may only pull from the central system. No pushes allowed." - here I meant no pushes from the central system. The display appliance may pull data and push status/etc.

Also, power outage/flood/fire/EMP are acceptable outage scenarios. These will be located at places of business, so if something closes the business and nobody can see the displays... it's not a big deal.
Another thing which I perhaps communicated poorly... "The appliance must report back what items it played at the end of each time segment. This communication cannot be lost." - The results can not be lost. If network connectivity is down, it is acceptable to communicate those results once connectivity returns.


Beyond that, many of these failover/robustness points are in the works (or under research), including smart displays that failover to a different input in case of signal loss. But it's certainly great to see the thought process behind them, and the whole of them spelled out so concisely.

Share this post


Link to post
Share on other sites
I'd personally take a leaf out of Erlang's book on this one. It's a damn shame you're stuck with C# 4.0, because Erlang is really quite suited to this sort of system.

Anyways... my suggestion would be to set things up with two layers of safety: one watchdog layer to ensure the main process runs and does not die (and to report errors should the main process crash), and one layer of redundancy. The redundant layer would handle failover/spillover from the main process, and additionally serve to ensure the watchdog itself remains active.

In other words, envision a tripod. One leg goes down, the other two ensure it restarts cleanly. Two legs go down, you have one left to catch the chain and restart the others. If all three drop simultaneously, you have a catastrophic failure, in which case you have a server-side watchdog which notes (after some time period) that a remote machine has failed to check in, and can fire off automated alerts/etc. as necessary. Provided you can keep the central system stable and up (probably using a similar failover/watchdog strategy) this should ensure maximal uptime without resorting to tricky business with hardware watchdogs etc. (Although if you have support for custom hardware, you can use a microcontroller's built-in watchdog to rig up something nifty... but that's probably well beyond the scope of this project.)

I would strongly recommend against multithreading, because thread errors have a tendency to drop an entire process when things go badly wrong. IPC is safer anyways, and eliminates the temptation to abuse shared memory between systems, which is an absolute no-no in building failsafe components.

Judicious placement of exception handlers and careful use of monitoring systems should get you a good multi-process solution in short order. I'm amazed you have so much time and manpower to throw at something... well, certainly not trivial, but well-understood and more or less straightforward to implement.

Share this post


Link to post
Share on other sites
Just to make sure I understand you correctly:

You have 3 parts - the main display; the backup/simple display; and the watchdog. Are you also advocating individual processes for the various things the main display needs to do? I understand you to mean that here.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!