• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
pronpu

A New Approach to Databases and Data Processing — Simulating 10Ks of Spaceships on My Laptop

34 posts in this topic

Hi.
Yesterday I published a new blog post on our company blog: http://blog.paralleluniverse.co/post/44146699200/spaceships

It showcases a 10Ks spaceship battle simulation we've built on top of our spatial DB, which was built in Java. The point of the post is not to show how fast the simulation can run because there are faster ways to do it. The idea is to show how even a naive solution running on top of the right kind of database can achieve decent performance and good scaling.

I think some of you may find this interesting.

0

Share this post


Link to post
Share on other sites
Hum. I thought that 10k simple spaceships fit in RAM quite well. It feels like the wrong test to showcase your database, because if you wanted to make a game like this, you would not even consider persisting changes to database on every simulation step. That's what my game-making brain tells me.

Impressive number of updates though. Edited by Nercury
0

Share this post


Link to post
Share on other sites
Hey Ron (it is Ron, right?), we exchanged a few emails last year if I remember correctly. I still have most of the same misgivings that I did back then. I reckon that "writing your entire application as stored procedures" is not a process that is likely to work well with most game developers, and besides the really big problems come in the more complicated logic. Currently what you have there is an example of 'embarrassingly parallel' logic, eg. separation forces based on a query, broadcasting of forces out to other entities, etc. And yet even then the code is somewhat more complex than we might like: visitors, lambdas, callbacks, etc. In my experience callback-based code complicate systems significantly. I can't follow the code associated with that SpatialModifyingVisitor to work out which ships it's called on, how it chooses which ship to attack, or when that will occur. Can I only ever access other entities from within the visit function called by a query?

Also, you've not shown yet how the system degrades (if at all) when faced with a lot of non-spatial queries, such as global chat or guild chat, auction houses, Looking For Group systems, etc.
1

Share this post


Link to post
Share on other sites

Yep, that's me.

 

Well, only mentioned stored procedures to show that this approach has grounding in the past. Obviously, we're not talking plsql here, and there is nothing to differentiate those "stored procedures" from regular application code. It is true, however, that you need to use callbacks and lambdas. Those look particularly bad in Java (at least until Java 7), but would be much more natural in, say, Clojure. But conceptually, I see no other choice for writing scalable code. As soon as you submit your code to be executed on some thread pool, you're using a lambda. Making this natural It's only a matter of syntax. If you don't use lambdas at all (and in most modern languages every function is a lambda anyway), you're basically back to the old C loop, and back to partitioning your domain into fixed, disjoint regions, each run by a single thread, and that can't scale.

 

So, whether or not callback-based code always complicates systems, or only does so with "old" languages -- callbacks are here to stay. I hope more languages would transform them syntactically to continuations as those appear more like single-threaded functions, and might be easier for the programmer to reason about.

 

I don't think the problem in the demo is embarrassingly parallel at all. Entities, in this case spaceships, are contended all the time and require locking. If you can think of something that would seem less embarrassing :), I'd love to know what it is so we can try to tackle it.

 

And yes, you can only access entities from within visitors in order to allow the database to schedule your code (and prevent deadlocks). The idea is that the database wants to know what data you wish to process, and then decides how to schedule it, and whether it's in conflict with other transactions. But, again, this wouldn't seem as much of a problem if callbacks would be represented syntactically as continuations.

 

And as for non-spatial queries, those will be supported by our future products. SpaceBase only runs spatial queries. I am very much interested to hear what would be the concurrency challenges for that domain. 

0

Share this post


Link to post
Share on other sites
I'd like to see you address how your solution is different from other fast NoSQL databases, like Redis.

In case you haven't looked at it, Redis allows you to run procedures in the actual store as well, and it does hundreds of thousands of queries per second from remote nodes with ease.

I realize that you claim your database is "spatial," but any number of "spatial" approaches can be re-written as traditional key/value (or tuple space, or whatever) equivalents. For example, multi-dimensional hash grids needs 2*N (that's linear in N) queries to find all possible neighbors within N dimensions on top of a flat database.

Finally:

sharding just won’t cut it

That may be true, but not for the reasons you quote :-)

For throughput, your limitation will be memory bus, I/O bus, and network bandwidth, not core bandwidth. One shard per bus is still totally feasible and a fine way to scale. (If not, then please don't tell Facebook or Google or Amazon or ... because they'd be disappointed :-)

Sharding has problems with object/object interactions specifically in the simulation space; specifically, it generates lots of messaging if you have lots of shards. Again, that's not limited by cores; that's limited by memory and networking. If you have a globally consistent database (spatial or not) then that will have to solve the same problems as the sharded simulation will have to do, except a database typically can't make application-specific simplifying assumptions that a simulation can, so I still think we should view simulations as simulations, separate from databases.
1

Share this post


Link to post
Share on other sites

I'd like to see you address how your solution is different from other fast NoSQL databases, like Redis.

 

It is different in that it's not a database that let's your un stored procedures, but one that runs you entire application. That's why I said you can think of it as writing your entire application as stored procedures, but stored procedures are usually thought of as something separate from your app; as a useful tool. Often (as in the case of Redis, too, I believe), they need to be written in a different programming language than the rest of your app. Here, your entire code is written as callbacks (or lambdas or closures -- whatever you want to call them) which you pass to the database, which schedules and runs them. 

 

 

One shard per bus is still totally feasible and a fine way to scale. (If not, then please don't tell Facebook or Google or Amazon or ... because they'd be disappointed :-)

 

Once you have lots of messages between shards then your architecture isn't really sharded: You have a sharded database, and then you need lots and lots of developers -- like they have at Facebook or Google or Amazon -- work around the sharding with complicated messaging code. I say, let the database handle that for you. It knows at which node each piece of information is best stored, so as to minimize messaging. You can read more about our data grid here, where I show how messaging is minimized.

 

 

 

I realize that you claim your database is "spatial," but any number of "spatial" approaches can be re-written as traditional key/value (or tuple space, or whatever) equivalents.

 

Ah, here you hit the nail on the head. The database can only be smart about partitioning and scheduling is if the data-structure captures the domain use. In a spatial setting, most transactions will span a contiguous region of space, and the database can exploit this locality. Geospatial hashing gives a rather poor approximation of this locality. That is why I do not envision a single data structure that can do all I think a database can and should do for all applications. Different types od data and different data-access patterns would require a different data structure. But once you pick the right one, you won't need all those hordes of scalability experts to work around the limitations of a "stupid" database.

 

As for simulations, it's true the demo happens to be a simulation, but that's incidental. Like I say in the article, there are better ways to write a simulation using better simplifying assumptions. But the demo is written naively using the database and still gives good performance. I would say, though, that coming to terms with the suggested approach requires thinking differently about what a database is and what it should do. I mention that in the past the database was also considered to be the core of the application, but we've since moved away from that point of view, and started regarding the database as a simple data store. For example, for low latency applications, I do not think that the database even needs to be durable, or even use a disk at all. 

Edited by pronpu
0

Share this post


Link to post
Share on other sites

It is true, however, that you need to use callbacks and lambdas. [...] I see no other choice for writing scalable code. As soon as you submit your code to be executed on some thread pool, you're using a lambda. Making this natural It's only a matter of syntax.

 

The question is really about what level in the application code the callbacks are. Here, we have callbacks right in the middle of the basic logic for an entity, and that is going to be very tricky for a developer to use. More common is to define the game logic more as atomic blocks of code that can immediately operate on the environment they are given. Complexity is going to be the biggest problem you face in getting any game developers to adopt this technology - the cards are already stacked against you with the Java background and the emphasis on spatial querying (which game developers typically view as incidental).

 

I don't think the problem in the demo is embarrassingly parallel at all. Entities, in this case spaceships, are contended all the time and require locking. If you can think of something that would seem less embarrassing smile.png, I'd love to know what it is so we can try to tackle it.

 

The movement and explosions seem to be about broadcasting forces out, which doesn't really require any locks at all. Then most of the shooting information seems to be read-only apart from the shooting, which also doesn't require exclusive access to the other ship (although you can implement it that way and call a function on it directly, as I think you do here). Perhaps this is a problem with your code being extremely hard to read for someone not familiar with what all the different queries do. Right now I look at that and I can't follow the code flow at all. That's the problem with making something explicitly callback-driven.

 

As an MMO developer I would be more interested in something that makes it very clear where the locking takes place, eg. a trading situation where both entities need to be locked to safely transfer an item from one to the other. And I'd want to see exactly in the code how it is implemented, how visible that safety is, and how much it differs from a simple sharded solution. Because all that will tell me whether writing my game using your system is going to be much harder than the alternatives or not. Right now, I look at it and it seems like a lot of overengineering for the hypothetical promise of more players per server. The market shows that players will adapt to small world populations much better than developers adapt to increawsing development budgets.

Also, have you run this demo on a distributed system? Project Darkstar had the notorious problem of trying something very similar to what you're doing, except that it didn't actually scale when distributed across the network. Logic that is efficient enough when locking entities to pass them from one core to another is not always efficient enough when you have to pass the entity from one process to another, especially over ethernet. A hyper-parallel system can find itself doing nothing most of the time, waiting on network traffic.

 

And as for non-spatial queries, those will be supported by our future products. SpaceBase only runs spatial queries. I am very much interested to hear what would be the concurrency challenges for that domain.

 

The concurrency challenges are whatever your system makes them. smile.png On a typical sharded system you could implement global chat by simply looping through every entity and sending it a message. A dynamically load-balanced system could broadcast to each of the servers which can perform the previous broadcast step internally. These are cheap operations on these systems. With your system, what happens if I want to run a query that returns every single entity of a certain type, regardless of distance? Is that going to harm performance, maybe shuffling every single entity to one node? Is there another way I can implement such behaviour?

0

Share this post


Link to post
Share on other sites

The question is really about what level in the application code the callbacks are. Here, we have callbacks right in the middle of the basic logic for an entity, and that is going to be very tricky for a developer to use. More common is to define the game logic more as atomic blocks of code that can immediately operate on the environment they are given.

 

That's a valid point. We're working on an API that transforms the callbacks into continuations that would appear single threaded to the programmer.

 

And I'd want to see exactly in the code how it is implemented, how visible that safety is, and how much it differs from a simple sharded solution. Because all that will tell me whether writing my game using your system is going to be much harder than the alternatives or not.

 

Future blog posts will address that. In the meantime, hopefully the user guide and demo code will help, and you can ask us more about it on our new forum.

 

Project Darkstar had the notorious problem of trying something very similar to what you're doing, except that it didn't actually scale when distributed across the network.

 

Project Darkstar was shutdown before reaching the point of a multi-node distribution. 

 

With your system, what happens if I want to run a query that returns every single entity of a certain type, regardless of distance? Is that going to harm performance, maybe shuffling every single entity to one node? Is there another way I can implement such behaviour?

 

SpaceBase only supports spatial queries. Future products will be built for other kinds of queries as well. You can be sure we won't recommend them before they're good enough for military use, let alone games.

 

the cards are already stacked against you with the Java background and the emphasis on spatial querying (which game developers typically view as incidental).

 

Ah, well, unfortunately you are right about that. I hope I will not be offending to many people here by saying this: we've come to realize game developers are the most conservative class of developers in existence. SpaceBase grew out of a project developed for the Israeli Air Force for a mission-critical, real-time C4I system. We're talking to people developing hard real-time missile interception systems, and they are much more open to innovation than game developers. Fact is, companies like Boeing are doing on-board avionics on the JVM, while some game developers are still convinced that they can produce better assembly code than a compiler or a JIT. Some have only recently come to terms with the benefits of an operating system :)

 

What we've seen is this: game developers' culture was formed at a time when every last inch of performance had to be squeezed out of old-gen CPUs in order to render impressive graphics. Now, game devs have quite mastered the GPU, and they think their games are fast because of the impressive graphics they've been able to achieve on the GPU. But the GPU is very much like the old CPUs: it's extremely simple. CPUs, with their pipelines, instruction-level parallelism, multiple pending cache misses and instruction re-ordering are extremely complex. There is no more "close to metal" when it comes to the CPU. Even the metal is more akin to software than to actual metal. But while other disciplines have come to rely on JIT and middleware, game developers are suspicious of both. 

 

For some reason, when you show a new technology to someone at, say, Twitter, they'll immediately start thinking how they can take advantage of it. When you show it to a game developer, he'll say, "where's the demo", and when you show him a demo he'll say, "well, that doesn't quite do what my game does". It's not a matter of games being more challenging. It's about game developer culture of suspicion with a super-heavy dose of NIH syndrome. 

 

The sad result is, and I know it will sound controversial, that game performance now lags behind the performance of apps in most other disciplines. It just doesn't appear that way because the graphics are so fast.

 

And this is why we've come to realize that convincing a game developer of anything is an exercise in futility. The truly challenging stuff takes place in the military and augmented reality communities, while games are about a decade behind state-of-the art. Like I said, it's a cultural thing, and there's little we can do to change that. Just about the only way to convince game developers to adopt a new technology is to not let them know it can be beneficial to games and let them think they've secretly uncovered some arcane knowledge.

 

That's the end of my rant. All I can do is put our stuff out there and if some adventurous game developer wants to try it, they'll see how helpful it is and they'll build a better game. If not - well, we make our money from other industries. You can only do so much to argue the theoretical merit of some technology or approach over another. There are always arguments and counterarguments. There's a point in the discussion when the developer will have to get his or her hands dirty and try the new technology out. Only then can the discussion continue.

 

I truly admire the game development community, and I remember the time when it was at the forefront of software development and showed others the way. I hope it gets to be that way again. In the meantime, I would really enjoy hearing your suggestions, reservations and questions and will try to address them, but in order to make the discussion fruitful - you're just going to have to try the stuff first. 

0

Share this post


Link to post
Share on other sites

I'll be interested in hearing further about results as your research continues, great work! 

 

Just a couple of notes:

 

- in terms of measuring scalability, instead of using the 2 variables as execution time and CPU,  I would instead recommend execution time and # of spaceships

 

- I've been using Google's Go language, and so far having tremendously good results.  To me, the language seems very well designed to be both simple in terms of concurrency programming but also extremely efficient.   Behind the scenes, it uses thousands of lightweight "microthreads" multiplexed onto a few actual threads, with the complexity hidden to the programmer, and for the most part it "just works".   

 

0

Share this post


Link to post
Share on other sites

Ah, well, unfortunately you are right about that. I hope I will not be offending to many people here by saying this: we've come to realize game developers are the most conservative class of developers in existence. SpaceBase grew out of a project developed for the Israeli Air Force for a mission-critical, real-time C4I system. We're talking to people developing hard real-time missile interception systems, and they are much more open to innovation than game developers. Fact is, companies like Boeing are doing on-board avionics on the JVM, while some game developers are still convinced that they can produce better assembly code than a compiler or a JIT. Some have only recently come to terms with the benefits of an operating system

 

A lot of generalizations. I can tell another anecdote: I am newbie in game development, and I don't find your technology convincing. What does that prove? Nothing.

 

What we've seen is this: game developers' culture was formed at a time when every last inch of performance had to be squeezed out of old-gen CPUs in order to render impressive graphics. Now, game devs have quite mastered the GPU, and they think their games are fast because of the impressive graphics they've been able to achieve on the GPU.

 

Indeed, the difference between CPU and GPU usage for graphics is non-debatable. I see the difference in both ease of development and performance advantages.

 

But the GPU is very much like the old CPUs: it's extremely simple.

 

Programming interface is simple because you are constrained to do and care about one thing at the time. You have strict constraints what you can do and where. GPU itself... ugh. How many APIs and extensions it must support? I would not call it "simple".

 

CPUs, with their pipelines, instruction-level parallelism, multiple pending cache misses and instruction re-ordering are extremely complex. There is no more "close to metal" when it comes to the CPU. Even the metal is more akin to software than to actual metal. But while other disciplines have come to rely on JIT and middleware, game developers are suspicious of both. 

 

I don't follow. All the advanced things that speed up execution of instructions are not slower than not having them. Having cache is faster than not having it. JIT does not magically make cache not miss. I am not suspicious of JIT, I love it. But lower level problems do not disappear with JIT. This is a non-point.

 

For some reason, when you show a new technology to someone at, say, Twitter, they'll immediately start thinking how they can take advantage of it. When you show it to a game developer, he'll say, "where's the demo", and when you show him a demo he'll say, "well, that doesn't quite do what my game does". It's not a matter of games being more challenging. It's about game developer culture of suspicion with a super-heavy dose of NIH syndrome. 

 

Really. Because we write physics engines, audio libraries and graphics drivers ourselves. That was sarcasm...

Yeah, you need to find target audience and convince them. It just so happens that we care about our precious loop and weight every little thing we add to it.

 

The sad result is, and I know it will sound controversial, that game performance now lags behind the performance of apps in most other disciplines. It just doesn't appear that way because the graphics are so fast.

 

 

Wow, nice assertion.

 

And this is why we've come to realize that convincing a game developer of anything is an exercise in futility. The truly challenging stuff takes place in the military and augmented reality communities, while games are about a decade behind state-of-the art. Like I said, it's a cultural thing, and there's little we can do to change that. Just about the only way to convince game developers to adopt a new technology is to not let them know it can be beneficial to games and let them think they've secretly uncovered some arcane knowledge.

 

Or you can use your tech and make an unrivaled gaming experience. Convince + Profit. Win Win.

 

That's the end of my rant. All I can do is put our stuff out there and if some adventurous game developer wants to try it, they'll see how helpful it is and they'll build a better game. If not - well, we make our money from other industries. You can only do so much to argue the theoretical merit of some technology or approach over another. There are always arguments and counterarguments. There's a point in the discussion when the developer will have to get his or her hands dirty and try the new technology out. Only then can the discussion continue.

 

Ok - seriously: I may try it, I see no need for me right now. I like access time of objects in my stack and heap. However, if I ever wanted to run my single game world on multiple servers, I would need a way to replicate data between them. Then I would research various approaches more fully. There are probably only few developers in this forum who may be interested in this, and they are most likely working in larger companies, with several people already dedicated to solving similar problems. I don't think they haven't found efficient solutions to much more complicated tasks.

 

I truly admire the game development community, and I remember the time when it was at the forefront of software development and showed others the way. I hope it gets to be that way again. In the meantime, I would really enjoy hearing your suggestions, reservations and questions and will try to address them, but in order to make the discussion fruitful - you're just going to have to try the stuff first. 

 

I will want what you have then you show that the thing you have has value. Sell it to someone :D. Demonstrate that you can reap the benefits. It draws attention well :).

I will take another read over your article and maybe more time to study the code. I will post more constructive comments then.

0

Share this post


Link to post
Share on other sites

This is basically just data-oriented-design, right -- having "the database" optimized for memory accesses, and also having the logic broken down into a series of parts that can be automatically scheduled across n-cores? This has been all the rage at the bleeding edge of games for the past 5 years... This isn't a new way of thinking... to get decent performance on the PS3 (which has a NUMA co-CPU), you pretty much have to adopt this way of thinking about code.

There's been plenty of presentations by game developers trying to get these ideas to filter down by boasting about their 25x speed boosts gained from this approach, etc...

 

So, your rant about game-devs might be accurate in some quarters, but is very misplaced in others. Your rants against these generic game developers just makes you seem insulting, defensive and unapproachable, sorry tongue.png

 

It also seems misplaced when your article is about running a sim of just 10k entities at 5fps...

 

What we've seen is this: game developers' culture was formed at a time when every last inch of performance had to be squeezed out of old-gen CPUs in order to render impressive graphics. Now, game devs have quite mastered the GPU, and they think their games are fast because of the impressive graphics they've been able to achieve on the GPU. But the GPU is very much like the old CPUs: it's extremely simple. CPUs, with their pipelines, instruction-level parallelism, multiple pending cache misses and instruction re-ordering are extremely complex. There is no more "close to metal" when it comes to the CPU. Even the metal is more akin to software than to actual metal. But while other disciplines have come to rely on JIT and middleware, game developers are suspicious of both. 

You need to look into how modern GPUs really function these days. All of those complications are present, and are much more complex due to the massive amounts of data that's in the pipeline at one time, and the massive amounts of latency that have to be hidden...

 

P.S. All our GPU code is JIT compiled, I would JIT my gameplay code on consoles if I was allowed (it is JITed on PC), and almost every game is built using licensed middleware (Unity, Unreal, etc...)

 

P.P.S. I've worked on military contracts with Boeing and the Australian Air Force, and I was not impressed with their attitudes towards efficiency. Too much bureaucracy and secrecy caused huge barriers between different parts of the implementation, necessitating huge amounts of hardware. We ended up being given a whole PC inside the simulator just to simulate a few of the cockpit displays (which required reverse engineering and re-implementing the same landscape rendering engine from a different PC that we weren't allowed to access) and to send those images to the master PC. Who needs to be efficient when you can just throw more hardware at the problem...

Edited by Hodgman
0

Share this post


Link to post
Share on other sites

hard real-time missile interception systems

I think the "conservatism" may be something else. After all, game developers are the first to use the latest and greatest graphics, computing, and input hardware innovations!
In the case of missile intercepts, it would be worth it spending one full computer per missile you plan on intercepting.
Game developers have to serve many thousands of users per computer, or the costs will drive them out of business!

To put a different spin on your thesis: Distributed computing solve approximately three problems:
1) How do I run different parts of the computation in different locations while keeping a cohesive whole? (messaging)
2) How do I move computational parts between systems when needed? (partitioning)
3) How do I decide which computational parts to put where? (load balancing, optimization)

For example, in a system like HLA, each computational node "inserts" itself into a subrange along any number of dimensions, and receives messages within that subrange. The computation is locked to a node, but the subrange of dimensions it interacts between may vary.

In a traditional "seamless world" game system, nodes insert themselves at mostly fixed geographical locations, and the computation for entities moves to the server that is "owner" of that particular piece of the world.

I think there's value in considering many different approaches to how to do load balancing, but saying that servers have to be written "on the inside" of a specific server framework will very much limit the appeal of the solution. A solution that works as a library/subsystem and lets you (the developer) "own" the rest is much more attractive.
1

Share this post


Link to post
Share on other sites

I've been using Google's Go language, and so far having tremendously good results.  To me, the language seems very well designed to be both simple in terms of concurrency programming but also extremely efficient.   Behind the scenes, it uses thousands of lightweight "microthreads" multiplexed onto a few actual threads, with the complexity hidden to the programmer, and for the most part it "just works".   

 

Thats right. Erlang and Clojure have similar constructs as well, and Clojure is generally faster than Go, if the code is optimized for performance. Both of these languages also have protections from data races that are absent in Go, but Go is great.

 

How many APIs and extensions it must support? I would not call it "simple".

 

I meant to say that a GPU core is simple in the sense that the time it takes to execute a given piece of code is simply related to the number of instructions. This is far from being the case with modern CPUs.

 

All the advanced things that speed up execution of instructions are not slower than not having them. Having cache is faster than not having it. JIT does not magically make cache not miss.

 

It's not slower, quite the opposite. But taking full advantage of it in a large piece of software is no longer achievable by handwriting assembly. There are too many things to take into consideration, and it's better to put our faith in the JIT or the optimizing compiler.

 

So, your rant about game-devs might be accurate in some quarters, but is very misplaced in others.

 

Yeah, sorry. I may have been carried away.

 

P.S. All our GPU code is JIT compiled

 

Cool. What do you use to do it?

 

A solution that works as a library/subsystem and lets you (the developer) "own" the rest is much more attractive.

 

This is generally true until you hit a wall. You run your program "on the inside" of the operating system and let it schedule threads for you because scheduling is usually best left to the framework. It's the same for micro-threaded environments like Erlang and Go. Because the main contribution of this approach is letting the "database" take charge of scheduling for the purpose of multicore utilization, I don't quite see how it can be implemented as simple functions. A scheduler, by definition, calls your code and not the other way around.

 

I think the "conservatism" may be something else. After all, game developers are the first to use the latest and greatest graphics, computing, and input hardware innovations!
...
Game developers have to serve many thousands of users per computer, or the costs will drive them out of business!

 

And yet, when it comes to utilizing multi- and many-core hardware, games are years behind. I've spoken to a senior developer at a very well-known AAA MMO studio, who told me they don't allow their developers to write multi-threaded code. Another architect at a different MMO studio told me that only now they're realizing that they're going to have to abandon the serial game-loop way of thinking, at this is very hard for them to accept. 

 

I think that game programmers, unlike their peers in other industries, still try to hold on to the illusion that by somehow being "close-to-metal" (which is an illusion in and of itself, IMHO) and controlling everything they would somehow utilize hardware better. I believe this is a direct result of what you were saying: game developers always needed to make the most out of a given fixed of hardware. But while in the past this could have only been achieved by controlling the entire software stack, today the opposite holds: to utilize modern hardware you must relinquish control.

 

In a traditional "seamless world" game system, nodes insert themselves at mostly fixed geographical locations, and the computation for entities moves to the server that is "owner" of that particular piece of the world.

 

Absolutely. I so other way to reduce communication. You just have to make sure that a node isn't overloaded when too much action is concentrated in the area it owns. The right kind of data structure would automatically take care of this kind of load balancing.

 

The trick to good concurrency is the realization that interesting concurrency must manage contention, and contention equals communication, and communication is slow. Disallowing contention altogether severely limits the application, but a good data structure can minimize communication. I did some calculations here (I may have posted a link to that here before).

0

Share this post


Link to post
Share on other sites

A scheduler, by definition, calls your code and not the other way around.

Agreed. But to the programmer, that detail should be hidden. When I'm used to using C with stdlib, or C++ with STL and Boost, or Java with javalibs, I don't need to re-write that depending on whether I run on the inside of the Linux kernel, or the Windows NT kernel, or the QNX kernel, or whatever. The actual API to interact with the scheduler is tiny -- threads and mutexes.

Similarly, the API to interact with your database-partitioned-scheduler should be tiny. The more you can make it look like threads-and-mutexes, or other forms of I/O, and the narrower you can make the interface, the more likely you will be to gain adoption.

game developers don't let developers write multi-threaded code

This is likely for a good reason: They've found that doing so causes bugs that cost schedule time and cause user experience problems, and are very hard to debug. That's a rational response. Another rational response might be to improve working conditions and the attractiveness of the job in general, so you'd attract developers who currently wouldn't consider the games industry. Another rational response might be to adopt various libraries that let them harness multi-threaded or distributed development with minimal friction -- that's where your job lies: Prove that you can deliver low-friction distributed systems development! The approach you've described so far sounds to me like "high friction" but I don't think that would be impossible to fix.
0

Share this post


Link to post
Share on other sites

The approach you've described so far sounds to me like "high friction" but I don't think that would be impossible to fix.

 

This actually sounds like some very constructive criticism. Thank you, and we'll get to work on that!

 

Naturally, many new products are a bit rough around the edges at first, though I don't think it's that high friction. You issue a query, you get the results passed callback, and within that callback the results are locked (for reading or writing, depending on the type of the query). You process the results in the callback and issue further queries if necessary. 

But I agree it's bit cumbersome, especially in Java/C++. We will be integrating a continuations library that will make the callback implicit and the whole thing look more single-threaded. Hopefully this will help reduce the friction...

0

Share this post


Link to post
Share on other sites

We will be integrating a continuations library that will make the callback implicit and the whole thing look more single-threaded. Hopefully this will help reduce the friction...

Or it will cause mysterious threading bugs, especially in the case of cancellations :-) Continuations are threads, and users sometimes forget that. And the cancellation case is usually pretty hard to get right.

As long as you're OK with requiring to make all code available to run in all locations at the same version, you could perhaps hide your database distribution behind some messaging framework. Make it look a bit like MPS, perhaps? You can perhaps expose the database as a way to send messages to entities. Entities register in the domains they get load balanced across. The trick is going to be how to "invisibly" take the desire to create an entity with the address tuple A,B,C and make it actually be instantiated on the right node... All with a super small API :-)
0

Share this post


Link to post
Share on other sites

And yet, when it comes to utilizing multi- and many-core hardware, games are years behind. I've spoken to a senior developer at a very well-known AAA MMO studio, who told me they don't allow their developers to write multi-threaded code. Another architect at a different MMO studio told me that only now they're realizing that they're going to have to abandon the serial game-loop way of thinking, at this is very hard for them to accept.

Again, this very much depends on who you talk to. You'll get a lot of the same responses in other segments of the software industry too.
In most places that I've worked, yes people are discouraged/banned from writing multi-threaded code based on shared-state concurrency, because it is notoriously difficult. Even veteran engineers make mistakes in this paradigm from time to time, as it's notoriously difficult to prove that your code doesn't have any race conditions or locking/scheduling problems.
In many courses, this particular paradigm is the only method of multi-threading that's taught -- so when you interview a graduate and ask how they'd take advantage of a multi-core games console, they almost always talk about mutexes!
Often, shared-state code like this seems to be correct, and may even pass it's unit test successfully for months or years, before failing randomly and unreproducibly. Many old games have multi-threading bugs that don't manifest on single-core CPU's, but do manifest on true SMP hardware -- e.g. to play Sim City 4 without random crashes, I have to go into the Windows task manager and change it's thread affinity to restrict it to one core! To avoid this, it's better to simply avoid this unreliable paradigm as much as possible.

Other paradigms, like the one you've presented, aren't as easy to totally screw up. Message passing, the Actor Model, functional style programming, and automatic decomposition of serial code into a directed-acyclic-graph of task nodes, are used.
 
For example, the "serial game loop" can be written as high level Lua code. Whenever it makes a function call to the underlying engine, this call isn't executed immediately, instead it's queued and a future is returned. At the end of the Lua code, all these queued functions and futures can have their dependencies determined (by which functions have which futures passed in as arguments) and can be converted into a DAG. The DAG can then be linearized into a schedule of individual tasks and fences, spread across many cores.
This lets your high-level programmers continue to write seemingly serial code, which is automatically parallelized, while a few veterans of shared-state concurrency maintain the scheduling logic.

As mentioned before, the PS3 has a slow GPU and a slow/simple (in-order, etc) single-core CPU (overview). All of it's computing power is in the many-core NUMA co-CPU. This means that every PS3 game of any complexity at all, is pretty much guaranteed to contain a damn lot of parallel code (and not the shared-state kind, as that does not perform well at all under NUMA).

The PS3 with it's many-core hybrid CPU, and the 360 with it's hyperthreaded tri-core CPU have been out for over half a decade, so any game developer working on consoles games should definitely had the multi-core lesson sink in by now! Failing to write multi-core capable code has the effect of reducing the power of these consoles by a huge factor (~3-6x), so it's safe to assume people who are generalized as being obsessed with squeezing out ever last drop of performance from a bit of hardware would be investigating their options here...
 

P.S. All our GPU code is JIT compiled

Cool. What do you use to do it?

Sorry, I meant that everyone's GPU code is JIT compiled.
You control the GPU via an API like D3D, and shader code (e.g. HLSL) which is compiled into an intermediate bytecode format.
The driver then takes the command queue that's produced from D3D, and your shader byte-code, and JIT compiles these into actual proprietary GPU instructions as necessary.

Edited by Hodgman
0

Share this post


Link to post
Share on other sites

Or it will cause mysterious threading bugs, especially in the case of cancellations :-) Continuations are threads, and users sometimes forget that. And the cancellation case is usually pretty hard to get right.

 

Well, I stand behind my claim that nowadays performance is gained by relinquishing control. At the end of the day you'll have to trust the framework to do the right thing. We take care of the edge cases for you.

 

As long as you're OK with requiring to make all code available to run in all locations at the same version, you could perhaps hide your database distribution behind some messaging framework. Make it look a bit like MPS, perhaps? You can perhaps expose the database as a way to send messages to entities.

 

What you describe sounds a whole lot like our open-source in-memory data grid, Galaxy, that serves as our low-level distribution layer. 

 

In most places that I've worked, yes people are discouraged/banned from writing multi-threaded code based on shared-state concurrency, because it is notoriously difficult. Even veteran engineers make mistakes in this paradigm from time to time, as it's notoriously difficult to prove that your code doesn't have any race conditions or locking/scheduling problems.

 

Except if you use a transactional framework that takes care of all that for you.

 

Every shared-state concurrency/distribution (we can agree it's the same problem) is an abstraction on top of message passing, that translates contention to messaging. In the CPU, that messaging layer is called cache-coherence, and uses variants of the MESI messaging protocol. Now there are two questions to be asked: 1) is the shared-state abstraction useful (or not, or maybe even harmful), and 2) what is the impact on performance?

 

As for 1), I'd claim that's it's not only useful, but often necessary, and, furthermore, often messaging architectures re-implement shared state, only they do it in an error-prone manner. If, for example, you have a seamless world, you do wish to model a shared state (your world). You could implement the shared state on top of messaging yourself, or use a pre-packaged abstraction. The reason shared state causes more trouble "out-of-the-box" than messaging is that messaging makes communication explicit, while shared state hides it. But, if you could use a shared state abstraction that takes care of races and deadlocks for you, then this problem, at least, is solved. You might be also persuaded that shared-state is useful by the fact that two of the best known actor systems, namely Erlang and Akka, both provide shared state in addition to actors. Erlang has ETS (and Mnesia), and Akka has limited STM. The designers of both platforms have realized that message passing alone is insufficient (or degrades at times to hand-rolled implementation of shared-state on top of it). Also, they implement shared state at a lower level (at least on a single machine) than their messaging API so it performs better.

 

And as for 2), I'd say this. Many transactional frameworks (databases, STM) turn a correctness problem into a performance problem. They guarantee correctness but at a high performance cost because they try to be general and address every possible data-access pattern. The aforementioned Project Darkstar took that same path. We, however, take a different path. We do not provide general transactions but only transactions that are likely within some domain. Our first database/scheduling framework, SpaceBase, assumes a spatial object domain, and allows transactions that span some contiguous region of space. This assumption does not only make internal scheduling decisions simpler (for example, we make sure that we never have to rollback conflicting transactions), it also gives tremendous performance (or scaling, rather), far better than you would have been able to do by re-implementing all of this yourself on top of messaging (at least not without a lot of effort).

 

The reason for that is that the framework/database is able to organize the objects in such a way as to minimize contention and therefore messaging. 

 

and not the shared-state kind, as that does not perform well at all under NUMA

 

Again, shared-state is simply an abstraction on top of message passing. NUMA's cross-node messages are relatively slow , so the aim should be to reduce the number of such messages regardless of the abstraction you use. I claim that the database can do a fine job at reducing messaging, and I've proved that in this highscalability article. The idea is that if you use the right kind of data structure (a b-tree in the article), communication is minimized. 

 

All modern computer systems are organized as layers of fast-access caches communicating among them via some slow communication buses. The same problem, BTW, applies not only to NUMA but to CPU cores at a lower level, and to cluster nodes at a higher level. A b-tree data structure, like that in the example, makes contention at higher-level tree nodes exponentially less likely than contention at lower nodes, and contention at higher nodes is exponentially more expensive (slower comm channel). The two cancel each other out exactly, making each operation O(1) with a very, very low constant (<< 1) in terms of cross-node communication, no matter the "layer" you're at (CPU cache, NUMA or cluster). 

 

Sorry, I meant that everyone's GPU code is JIT compiled.
You control the GPU via an API like D3D, and shader code (e.g. HLSL) which is compiled into an intermediate bytecode format.

 

Of course. That's still an AOT compiler, not a JIT, as it compiles the shader before it runs for the first time. I was referring to the merits of a true profiling JIT, one that performs optimizations based on the running profile of the code. I thought that somehow you've done that with shaders.

0

Share this post


Link to post
Share on other sites

Every shared-state concurrency/distribution (we can agree it's the same problem)

I disagree! In my view, shared-state concurrency is very different from a cohesive distributed view. Being able to be divergent but eventually consistent on disparate nodes is a hugely important distinction that you don't have in MESI.

Also, Galaxy: I remember you posted about that a while back, and I had about the same thing to say about itthen: MESI over a network is probably solving the wrong problem.
0

Share this post


Link to post
Share on other sites

Well, I stand behind my claim that nowadays performance is gained by relinquishing control. At the end of the day you'll have to trust the framework to do the right thing. We take care of the edge cases for you.

At the end of the day, all that any program does is transform some data into some other data, by a sequence of processes.

Given enough time, money, talent and patience, it will always be possible to replicate the results of some framework by instead writing everything by hand, even in raw asm etc... The reason this isn't feasible is that the time/money/talent/patience costs are insanely high!
Choosing to use a framework/library/middleware, as always, should be based on a cost-benefit analysis -- e.g. can we get the same performance ourselves, given the same amount of time/money and our current staff?
Often the answer to that is "no", which is why middleware is so damn popular in the games industry (contrary to your assertion). Games can be written from scratch (and sometiemes you might have better performance if they were), but it's usually cheaper/faster to just license some middleware that reduces your production time by an order of magnitude.

Except if you use a transactional framework that takes care of all that for you.

That's exactly my point -- writing shared-state multi-threaded code without a framework is difficult and dangerous, and hence discouraged. That's why game developers do use concurrency frameworks, and discourage the "manual authoring" of multi-threaded code outside of one (i.e. we don't like to write multi-threaded code by hand).

That said, transactional frameworks are largely rubbish for many kinds of games, as they often make the decision to give up determinism, which is essential for some games. Typical shared-state designs revolving around mutexes are often the same, with execution speed of different threads determining the order in which simulation operations occur.

On that note, with your space-ship example, it's not obvious what kind of ordering constraints are/can-be applied to the different processes. Are all beams fired before any damages are calculated, etc?
In the Actor Model implementation that I've used for games, these kinds of ordering issues were explicit to the programmer, allowing them to maintain a deterministic simulation, regardless of the number of cores or the scheduling algorithm used. e.g. An actor could specify that within a particular simulation frame, it would process all "FireBeam" messages before it processed any "TakeDamage" messages.

As above, I don't know of any big games using anything like STM, instead most use something closer to a stream-processing or flow-based programming model, which is easy to reason about (you can write the main flow in a single-threaded style and have it be automatically decomposed), has few performance pitfalls (no messy rewinding, can be well optimized for data access patterns), is 100% deterministic, and allows for huge data-parallelism and task-parallelism.
STM-like techniques are usually only an internal implementation detail, e.g. the algorithm to push an item into a lock-free queue may be transactional, repeatedly trying again and again until it succeeds without conflict. This queue would then be used to implement a more useful concurrency model.

Every shared-state concurrency/distribution (we can agree it's the same problem) is an abstraction on top of message passing, that translates contention to messaging. In the CPU, that messaging layer is called cache-coherence, and uses variants of the MESI messaging protocol.

I wouldn't agree with that. Multiple threads sharing the same RAM without restriction gives us the "shared state" model of concurrency. When we build a message passing framework, we build it directly on top of this model.
Underneath this model, there is the hardware implementation, which in some cases, yes, uses MESI/etc, which gives us the layers:
1) Message passing between caches.
2) Shared RAM.
3) High level Messages.
The high level message passing layer is still built on top of the shared-state layer, regardless of how layer #1 operates. #1 is just a black box in order to make layer #2 function correctly.

The type of message passing going on in layer #3 is semantically different to layer #1.

Also, in the example of the PS3 that I was using, the SPU co-processors don't have a cache at all, so obviously don't implement MESI. They have their own local-RAM, and can initiate transfers to/from the main shared RAM and their local-RAM (and two concurrent but overlapping transfers will have indeterminable results).
You could argue that these transfers are "messages" and therefore it's built upon message passing, but if you go that far, then "shared-state" ceases to exist at all: interacting with the global state is just "sending a message"! ...and that doesn't seem like a useful thought construct. Edited by Hodgman
0

Share this post


Link to post
Share on other sites

I disagree! In my view, shared-state concurrency is very different from a cohesive distributed view. Being able to be divergent but eventually consistent on disparate nodes is a hugely important distinction that you don't have in MESI.

 

What I meant was the concurrency is essentially distribution across nodes with a relatively slow interconnect between them, just like a network. I don't understand how the two can be considered different. As I said before, all computer architecture is built on the idea of local nodes with relatively slow interconnects. Any use of the comm bus, be it between cores, NUMA sockets or cluster nodes should be minimized in order to increase scaling. The comm bus has two properties that hinder performance: low throughput and low latency. Eventual consistency tries to bypass the latency limitation. However, eventually consistent scaling sometimes sacrifice consistency for no good reason. They are architecturally lazy when it comes to object placement by employing a very simple scheme: usually a DHT. With such a bad strategy, sacrificing consistency gives latency benefits. But smarter object placement schemes can achieve the same (amortized, not worst case) latency while maintaining consistency. 

 

which is why middleware is so damn popular in the games industry (contrary to your assertion)

 

Is it (I'm asking seriously)? It is certainly true on the client side with game engines, but which server-side frameworks are in common use in games? 

 

On that note, with your space-ship example, it's not obvious what kind of ordering constraints are/can-be applied to the different processes. Are all beams fired before any damages are calculated, etc?

 

The demo was purposefully written to simulate asynchronous requests coming over the network, each being processed as it arrives with no global "ticks".

 

In the Actor Model implementation that I've used for games, these kinds of ordering issues were explicit to the programmer, allowing them to maintain a deterministic simulation, regardless of the number of cores or the scheduling algorithm used.

 

The same could be achieved with SpaceBase (though that was not the intention in the demo). You could collect all requests and then process them in parallel, deterministically. I'm not sure this property is always so important (or important at all), though. BTW, what actor implementation have you used? 

 

most use something closer to a stream-processing or flow-based programming model, which is easy to reason about (you can write the main flow in a single-threaded style and have it be automatically decomposed)

 

Which frameworks are used for that?

 

I wouldn't agree with that. Multiple threads sharing the same RAM without restriction gives us the "shared state" model of concurrency. When we build a message passing framework, we build it directly on top of this model.

 

Of course. Software level messaging abstraction is implemented on top of a shared RAM abstraction, which is, in turn, implemented on top of hardware level messaging. The messages are different, but the point is that shared state is always an abstraction on top of messaging, and computer architecture is such that this cross-"node" messaging should be minimized. We can call it the "shared-state/message-passing duality". Sometimes it's best to work with one abstraction, and sometimes with the other. However, a shared-state framework can certainly improve performance in many cases, as it has more knowledge about the domain semantics to apply optimizations. 

0

Share this post


Link to post
Share on other sites

each being processed as it arrives with no global "ticks"

While the original QuakeWorld did something like this, games have since then moved away from this for several reasons, including repeatability, fairness, debuggability, and determinism.

which server-side frameworks are in common use in games?

If you choose an engine, like UDK, CryEngine, Id Tech, C4, Unity, or whatever, you get a networking system to go with it. Tha one exception I know of is Gamebryo, although a quick Google says they just recently added networking. Interestingly, I think they merged with a MMO technology company a few years ago, and then dropped the MMO server-side part...

It is true that, if you as a game developer do not choose an engine, but instead assemble libraries (a fine methodology IMO,) there are only low-level networking systems (RakNet, Gamespy, Lidgren, etc.)

If you look at RAD Game Tools, they have nice, focused libraries for specific applications. A game-level networking library in that vein would probably be very welcome. Then we're back at "library" vs "framework." If you're picking a "framework" then, in game terms, that's an "engine" and should solve the soup-to-nuts game tech problem.

If you cannot re-package your technology as a library, perhaps the best option is to make common cause with one of the existing game technologies -- be it Hero Engine, Unreal, Id Tech, or Bigworld.
0

Share this post


Link to post
Share on other sites

At the end of the day you'll have to trust the framework to do the right thing. We take care of the edge cases for you.

 

But this doesn't tally with this:

 

 

SpaceBase only supports spatial queries. Future products will be built for other kinds of queries as well.

 

If you see online games as spatial simulations then they are a suitable nail for your spatial database to hammer. But that is one particular reductionist view of games. Much of the game has little or nothing to do with space and doesn't benefit from spatial coherence. Solving the spatial problem well is hard, which you know, and which is why you feel that what you've made has potential worth. But game developers realised that the spatial problem was hard about 10 years ago and designed around it, in a way that suits the rest of their game and which isn't limiting when faced with non-spatial queries. Unfortunately you're now coming along with a solution to a problem most MMO developers don't really believe they have, and asking them to change everything about how they work in order to take advantage of it. As mentioned above, this is 'high friction'.

 

Auction houses, global chat, guilds, private messages to anybody in the world, a Who list, etc. - these are the real "edge cases" in your system and you don't seem to be giving positive responses to how they would be handled. If SpaceBase does not let me implement them, or forces me to have a parallel database that I must maintain separately, then it is not suitable for MMOs. This is something you'll have to think hard about.

 

 

Project Darkstar was shutdown before reaching the point of a multi-node distribution.

 

The 'project' was shut down, in terms of funding being withdrawn, but that doesn't mean so much for open-source. The people who worked on it had multi-node up and running, and the contention costs were so high that adding nodes actually reduced performance, hence it wasn't distributed in that state. It turns out that an optimistic locking model degrades significantly when distributed across the network and when the simulation's spatial coherence is not high enough to keep interacting entities colocated, with a resultant cascade effect when entities end up spending more and more of their time on the wire and not in memory.

 

 

For some reason, when you show a new technology to someone at, say, Twitter, they'll immediately start thinking how they can take advantage of it.

 

Startup tech companies have almost literally money to burn in the name of research and discovering cool things. Twitter has been run almost entirely on the back of investors throwing money at it in the hope that they find a way to turn a profit later. Not only can they afford to experiment, but arguably they are being paid to do exactly that.

 

Games companies are not tech startups. They are rarely funded by investors in the same sense and are much more linked to retail sales. There's little room for experimentation or risk-taking. Making games takes a lot of time, and most projects are not profitable anyway. As a result, you almost always need to reuse the code you have already invested in. It doesn't make good business sense to throw out half a decade of past work to take a chance on a new language and indeed a new paradigm for your next game when you have firm market evidence showing that customers are happy to play in the type of system that was designed several years earlier. This may seem conservative when compared to the world of websites that make a loss year after year, but the market conditions for these businesses are very different.

Edited by Kylotan
0

Share this post


Link to post
Share on other sites

I also think there is a lineage difference. Web developers come out of HTML design and high-level application frameworks. They have never had a multi-year project die because of occasional one-second lag spikes. They generally wouldn't be able to explain how a L1 memory cache is different from an OS-level block buffer cache. They want to use whatever new shiny toy gets their site to look and work better for their test user -- and most of them don't really have that many concurrent users. A lot of interesting products and use cases come out of that lineage, but at the end of the day, the successful ones end up having to hire large teams of experienced low-level people to re-write and operate their site at scale. Those experienced guys, at turn, have often seen any number of "promising frameworks" turn operational infrastructure to mush, and are rightfully conservative.

 

Someone who wants to sell dreams to application developers had better speak application developer language, which is all about getting more stuff out with less effort. Someone who wants to sell systems to low-level operational people, in turn, had better be prepared to justify why they're good, like, say, memcached, rather than terrible like, say, Rails.

0

Share this post


Link to post
Share on other sites


Sorry, I meant that everyone's GPU code is JIT compiled.
You control the GPU via an API like D3D, and shader code (e.g. HLSL) which is compiled into an intermediate bytecode format.

Of course. That's still an AOT compiler, not a JIT, as it compiles the shader before it runs for the first time. I was referring to the merits of a true profiling JIT, one that performs optimizations based on the running profile of the code. I thought that somehow you've done that with shaders.


The actual code that's run on the GPU depends on both the shader bytecode and the command stream generated from the API.
e.g. when compiling shader bytecode, the driver might have a choice to optimize for reduced register usage (size), or reduced ALU operations (speed). If the compiler guesses that this shader will be memory-bound because it contains many fetch instructions, then it may choose to optimize for size over speed, because this will allow more threads to be in flight at once (the GPU's version of hyperthreading is flexible in the number of "hardware threads" that can be run, depending on the register usage of the current shader).
However, whether this guess is correct or not depends on the command stream. The API can configure the inputs such that all of the memory fetches will read the same byte of memory, resulting in an epic caching win and no memory bottleneck, or it can configure the inputs such that every memory access is random an uncacheable. If the former, the compiler should have optimized for speed, and if the latter it should have optimized for size.
There's many other examples like this -- e.g. when two different shader programs are executing simultaneously on different sub-sets of the GPUs resources, the best optimization for a shader method may depend on what the "neighbouring" shader is.
Apart from these kinds of JIT-to-optimize cases, there's also times when the driver is outright required to modify/recompile the code -- e.g. depending on what kind of texture you've bound via the API, the instructions to fetch and decode floating point values from it will differ (e.g. for byte-textures you need a "divide by 255"); some GPUs have specialized hardware for this that are configured by the API, while others require the shader assembly to be recompiled with the appropriate instructions based on the API state.
 

Is [middleware actually popular] (I'm asking seriously)? It is certainly true on the client side with game engines, but which server-side frameworks are in common use in games?

Game engines usually aren't just a client-side only thing.
For a single-player game, the client machine has to run both the game client and the game server.
In a peer-to-peer game, the same is true, as often every client is acting as a shared server.
In a client-server game, a popular feature is to allow one player to act as a server, instead of requiring a dedicated server machine.

Given this, it's helpful if you can just write the game once, in such a way where it can be used in all of the above models, by simply changing some replication and authority policies. High-end game engines contain a lot of utilities to help write these kind of "simultaneously client-only/server-only/client-is-server" code-bases.
So game engines are both client middleware and server middleware (and often you'll supplement an engine with a lot of extra middleware -- the big engines often provide integration/glue code for other popular middleware packages to make it easy to use them alongside their engine).
 
Often as well as the game client and server, there is a "master server", which players connect to first in order to find a particular server.
e.g. a game server hosts 64 players at once and runs a game simulation, while the master server hosts millions of players at once, but only runs chat and matchmaking logic.
In this space, you'll definitely find a lot of standard technologies like MySQL, memcached, JSON, REST API's over HTTP, Comet/Push web stuff, MongoDB, etc...
 
MMOs are similar to the latter group, but I've never worked for an MMO developer so I can't comment on their tech.

In the MMO-space, I know of quite a few that are largely written in Python (e.g. EVE, any Big World ones), and this has come back to bite them in the age of concurrency ("global interpreter lock"...). Eve got blasted recently for being mainly single-core, and part of their official response said (paraphrasing) that multi-core isn't just something we can switch on, and python makes the transition especially hard on us. So those stories do fit with your anonymous MMO anecdote about finding it hard to move away from the single-threaded game-loop model wink.png
MMOs have largely been confined to the PC realm though, so unlike console developers, they weren't forced to make the multi-core transition back when multi-core consoles became mainstream.
 

BTW, what actor implementation have you used?

A proprietary (read: NIH) one, because the ones we looked at weren't suitable for games wink.png
It worked by:
* It operated in 'cycles' where a groups of messages were executed, which would result in objects being created/destroyed/being-sent-messages. Each sim frame, the actor model would continue to run these cycles continuously until no more messages were produced, at which point it would begin the next sim frame.
* Every thread had a thread-local buffer where messages could be written. Whenever you called a method on an actor, the args would be written to the current thread's buffer and a future returned.
* Futures didn't have any kind of complex waiting logic -- the system knew immediately how many cycles in the future the value would be ready. The only way to 'resolve' a future into a value was to use it as an argument to a message. That message would be delayed the appropriate number of cycles so that it wasn't executed until after it's arguments existed.
* At the end of each cycle:
*** A new buffer full of messages has been produced by each thread. These were merged into one list, and sorted by actor-ID (and a 2nd sorting key to provide determinism via message priority). That sorted queue was then split into one non-overlapping range of actors per thread, to be executed next cycle.
*** Any objects who's reference count had been decremented would be checked for garbage collection / destruction.
*** Any new objects would have their IDs assigned in a deterministic manner.
 
It was nice simply because it let people, for the most part, continue writing in a style similar to the traditional OOP that they were used to, but the execution would be split across any number of cores (up to the number of active Actors), while still remaining completely deterministic, and with very little synchronisation between threads (most communication was wait-free in optimal conditions, even reference-counting was wait-free and without cache-contention).
 

I'm not sure this property [determinism] is always so important (or important at all), though.

Determinism can be massively important!
A massive part of game development is spent debugging, and a deterministic sim makes reproducing complex cases very simple -- someone who has produced the error can just save their inputs-replay file. Anyone can then replay this file to see how the error case developed over time, breaking execution and inspecting state at the error itself, but also in the lead up to the error.
 
Many games rely on deterministic simulation in order to even be feasible from a networking point of view. e.g. RTS games are pretty much the only genre where you can actually have several thousand player controlled entities interacting in a world. The usual methods of state-replication produce ridiculously high bandwidth requirements when applied to these situations, so instead, it's extremely common for multiplayer RTS games to instead use a lock-step simulation where all the players only share their inputs.
The bandwidth between keyboard and chair is very low, so networking based on this input means that you can have an unlimited number of units on the battlefield with a fairly constant network bandwidth requirement. Players simply buffer and share their inputs, and then apply them at agreed upon times, resulting in the same simulation on all clients.

Also, many console games don't have the luxury of dedicated servers. Instead, they often use the one-client-is-the-server model. This means that the server's outward bandwidth is restricted to whatever the upload bandwidth is on a residential DSL connection (which is pretty low). This makes networking strategies based around determinism and input sharing quite popular, due to the low bandwidth requirements.

For a general-purpose game engine that can be used for any genre, it's pretty much required to support determinism in case the end user (game developer) requires it. To not support it, is to cut off a percentage of your clients.

[edit missed this one]

Which frameworks are used for [stream processing]?

I'm not aware of any popular stream processing middleware, the ones I've been exposed to have either been a component of a game engine, or a home-grown proprietary solution. I guess one of the nice things about stream processing is that you don't need a big complicated framework to implement it, so the damage of NIH isn't as bad.


On a tangent, the difference between (pejorative) "frameworks" and (simple) "libraries" is an interesting one. Technically, they're often both just a code library, but a "framework" usually requires that all the code that you write when using it conforms to it's particular worldview. e.g. every class has to follow some pattern, inherit some base, provide some overload, be mutated in some certain way, etc, etc... You can usually tell when you try and use two different "frameworks" together, and end up in a hell of a conflicting mess. Game engines often fit this category -- they take on so many responsibilities that everything you do is tightly coupled to their way of doing thigs...
In some of the projects that I've worked on, these kinds of frameworks have caused a lot of internal friction, so I've seen many teams resort to attacking code-bases with a figurative axe, breaking "frameworks" down into a large collection of simple, single-responsibility libraries, resulting in a much more flexible code-base.
The reason I bring this up was because the last time I saw this, the instigator was a parallel processing framework, and the need for some "exceptions" to be bolted onto it. Like you said earlier, even functional languages sometimes resort to using the shared-state model! The end result was that we ended up with a few different processing libraries that could be used together easily, without any frameworks tying code down too strongly to a single model.


On the PS3, there is a popular bit of middleware called SPURS, which basically implements a "job" type system -- a job is a function and a bunch of parameters/data-pointers, which you can put into a queue and can later check/wait for it's completion.
Because of this, many game engines use this model as their low level concurrency layer (implementing "job" systems on 360/PC/etc as well). They then often build another layer on top of this job system that allows for stream processing or other models.

One popular game engine that I've used implemented a pretty neat stream processing system where:
* the processing pipeline could be described as data - the kernels would be compiled from this description dynamically.
* the pipeline description was a graph of nodes representing function pointers, and the edges told it how the function's input/output arguments flowed between nodes.
* when compiling, you could tell it how much 'scratch memory'/'node memory' etc it had available, which would affect how it unrolled the loops and constructed kernels etc...
** e.g. maybe it will just call each function in sequence on one bit of data at a time, or maybe it will call function #1 on 64 bits of data then function #2 using those 64 inputs, etc...
* on platforms that allowed for code-generation, it would actually memcpy code around the place to create highly efficient kernels. On other platforms it operated by actually calling function pointers.

You can read a bit about a similar system here: http://www.insomniacgames.com/tech/articles/0907/files/spu_shaders_introduction.pdf Edited by Hodgman
1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0