Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


A New Approach to Databases and Data Processing — Simulating 10Ks of Spaceships on My Laptop


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
34 replies to this topic

#1 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 28 February 2013 - 06:12 AM

Hi.
Yesterday I published a new blog post on our company blog: http://blog.paralleluniverse.co/post/44146699200/spaceships

It showcases a 10Ks spaceship battle simulation we've built on top of our spatial DB, which was built in Java. The point of the post is not to show how fast the simulation can run because there are faster ways to do it. The idea is to show how even a naive solution running on top of the right kind of database can achieve decent performance and good scaling.

I think some of you may find this interesting.



Sponsor:

#2 Nercury   Crossbones+   -  Reputation: 770

Like
0Likes
Like

Posted 28 February 2013 - 08:10 AM

Hum. I thought that 10k simple spaceships fit in RAM quite well. It feels like the wrong test to showcase your database, because if you wanted to make a game like this, you would not even consider persisting changes to database on every simulation step. That's what my game-making brain tells me.

Impressive number of updates though.

Edited by Nercury, 28 February 2013 - 08:12 AM.


#3 Kylotan   Moderators   -  Reputation: 3338

Like
1Likes
Like

Posted 28 February 2013 - 09:05 AM

Hey Ron (it is Ron, right?), we exchanged a few emails last year if I remember correctly. I still have most of the same misgivings that I did back then. I reckon that "writing your entire application as stored procedures" is not a process that is likely to work well with most game developers, and besides the really big problems come in the more complicated logic. Currently what you have there is an example of 'embarrassingly parallel' logic, eg. separation forces based on a query, broadcasting of forces out to other entities, etc. And yet even then the code is somewhat more complex than we might like: visitors, lambdas, callbacks, etc. In my experience callback-based code complicate systems significantly. I can't follow the code associated with that SpatialModifyingVisitor to work out which ships it's called on, how it chooses which ship to attack, or when that will occur. Can I only ever access other entities from within the visit function called by a query?

Also, you've not shown yet how the system degrades (if at all) when faced with a lot of non-spatial queries, such as global chat or guild chat, auction houses, Looking For Group systems, etc.

#4 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 28 February 2013 - 09:51 AM

Yep, that's me.

 

Well, only mentioned stored procedures to show that this approach has grounding in the past. Obviously, we're not talking plsql here, and there is nothing to differentiate those "stored procedures" from regular application code. It is true, however, that you need to use callbacks and lambdas. Those look particularly bad in Java (at least until Java 7), but would be much more natural in, say, Clojure. But conceptually, I see no other choice for writing scalable code. As soon as you submit your code to be executed on some thread pool, you're using a lambda. Making this natural It's only a matter of syntax. If you don't use lambdas at all (and in most modern languages every function is a lambda anyway), you're basically back to the old C loop, and back to partitioning your domain into fixed, disjoint regions, each run by a single thread, and that can't scale.

 

So, whether or not callback-based code always complicates systems, or only does so with "old" languages -- callbacks are here to stay. I hope more languages would transform them syntactically to continuations as those appear more like single-threaded functions, and might be easier for the programmer to reason about.

 

I don't think the problem in the demo is embarrassingly parallel at all. Entities, in this case spaceships, are contended all the time and require locking. If you can think of something that would seem less embarrassing :), I'd love to know what it is so we can try to tackle it.

 

And yes, you can only access entities from within visitors in order to allow the database to schedule your code (and prevent deadlocks). The idea is that the database wants to know what data you wish to process, and then decides how to schedule it, and whether it's in conflict with other transactions. But, again, this wouldn't seem as much of a problem if callbacks would be represented syntactically as continuations.

 

And as for non-spatial queries, those will be supported by our future products. SpaceBase only runs spatial queries. I am very much interested to hear what would be the concurrency challenges for that domain. 



#5 hplus0603   Moderators   -  Reputation: 5538

Like
1Likes
Like

Posted 28 February 2013 - 11:14 AM

I'd like to see you address how your solution is different from other fast NoSQL databases, like Redis.

In case you haven't looked at it, Redis allows you to run procedures in the actual store as well, and it does hundreds of thousands of queries per second from remote nodes with ease.

I realize that you claim your database is "spatial," but any number of "spatial" approaches can be re-written as traditional key/value (or tuple space, or whatever) equivalents. For example, multi-dimensional hash grids needs 2*N (that's linear in N) queries to find all possible neighbors within N dimensions on top of a flat database.

Finally:

sharding just won’t cut it

That may be true, but not for the reasons you quote :-)

For throughput, your limitation will be memory bus, I/O bus, and network bandwidth, not core bandwidth. One shard per bus is still totally feasible and a fine way to scale. (If not, then please don't tell Facebook or Google or Amazon or ... because they'd be disappointed :-)

Sharding has problems with object/object interactions specifically in the simulation space; specifically, it generates lots of messaging if you have lots of shards. Again, that's not limited by cores; that's limited by memory and networking. If you have a globally consistent database (spatial or not) then that will have to solve the same problems as the sharded simulation will have to do, except a database typically can't make application-specific simplifying assumptions that a simulation can, so I still think we should view simulations as simulations, separate from databases.
enum Bool { True, False, FileNotFound };

#6 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 28 February 2013 - 12:26 PM

I'd like to see you address how your solution is different from other fast NoSQL databases, like Redis.

 

It is different in that it's not a database that let's your un stored procedures, but one that runs you entire application. That's why I said you can think of it as writing your entire application as stored procedures, but stored procedures are usually thought of as something separate from your app; as a useful tool. Often (as in the case of Redis, too, I believe), they need to be written in a different programming language than the rest of your app. Here, your entire code is written as callbacks (or lambdas or closures -- whatever you want to call them) which you pass to the database, which schedules and runs them. 

 

 

One shard per bus is still totally feasible and a fine way to scale. (If not, then please don't tell Facebook or Google or Amazon or ... because they'd be disappointed :-)

 

Once you have lots of messages between shards then your architecture isn't really sharded: You have a sharded database, and then you need lots and lots of developers -- like they have at Facebook or Google or Amazon -- work around the sharding with complicated messaging code. I say, let the database handle that for you. It knows at which node each piece of information is best stored, so as to minimize messaging. You can read more about our data grid here, where I show how messaging is minimized.

 

 

 

I realize that you claim your database is "spatial," but any number of "spatial" approaches can be re-written as traditional key/value (or tuple space, or whatever) equivalents.

 

Ah, here you hit the nail on the head. The database can only be smart about partitioning and scheduling is if the data-structure captures the domain use. In a spatial setting, most transactions will span a contiguous region of space, and the database can exploit this locality. Geospatial hashing gives a rather poor approximation of this locality. That is why I do not envision a single data structure that can do all I think a database can and should do for all applications. Different types od data and different data-access patterns would require a different data structure. But once you pick the right one, you won't need all those hordes of scalability experts to work around the limitations of a "stupid" database.

 

As for simulations, it's true the demo happens to be a simulation, but that's incidental. Like I say in the article, there are better ways to write a simulation using better simplifying assumptions. But the demo is written naively using the database and still gives good performance. I would say, though, that coming to terms with the suggested approach requires thinking differently about what a database is and what it should do. I mention that in the past the database was also considered to be the core of the application, but we've since moved away from that point of view, and started regarding the database as a simple data store. For example, for low latency applications, I do not think that the database even needs to be durable, or even use a disk at all. 


Edited by pronpu, 28 February 2013 - 12:27 PM.


#7 Kylotan   Moderators   -  Reputation: 3338

Like
0Likes
Like

Posted 28 February 2013 - 07:10 PM

It is true, however, that you need to use callbacks and lambdas. [...] I see no other choice for writing scalable code. As soon as you submit your code to be executed on some thread pool, you're using a lambda. Making this natural It's only a matter of syntax.

 

The question is really about what level in the application code the callbacks are. Here, we have callbacks right in the middle of the basic logic for an entity, and that is going to be very tricky for a developer to use. More common is to define the game logic more as atomic blocks of code that can immediately operate on the environment they are given. Complexity is going to be the biggest problem you face in getting any game developers to adopt this technology - the cards are already stacked against you with the Java background and the emphasis on spatial querying (which game developers typically view as incidental).

 

I don't think the problem in the demo is embarrassingly parallel at all. Entities, in this case spaceships, are contended all the time and require locking. If you can think of something that would seem less embarrassing smile.png, I'd love to know what it is so we can try to tackle it.

 

The movement and explosions seem to be about broadcasting forces out, which doesn't really require any locks at all. Then most of the shooting information seems to be read-only apart from the shooting, which also doesn't require exclusive access to the other ship (although you can implement it that way and call a function on it directly, as I think you do here). Perhaps this is a problem with your code being extremely hard to read for someone not familiar with what all the different queries do. Right now I look at that and I can't follow the code flow at all. That's the problem with making something explicitly callback-driven.

 

As an MMO developer I would be more interested in something that makes it very clear where the locking takes place, eg. a trading situation where both entities need to be locked to safely transfer an item from one to the other. And I'd want to see exactly in the code how it is implemented, how visible that safety is, and how much it differs from a simple sharded solution. Because all that will tell me whether writing my game using your system is going to be much harder than the alternatives or not. Right now, I look at it and it seems like a lot of overengineering for the hypothetical promise of more players per server. The market shows that players will adapt to small world populations much better than developers adapt to increawsing development budgets.

Also, have you run this demo on a distributed system? Project Darkstar had the notorious problem of trying something very similar to what you're doing, except that it didn't actually scale when distributed across the network. Logic that is efficient enough when locking entities to pass them from one core to another is not always efficient enough when you have to pass the entity from one process to another, especially over ethernet. A hyper-parallel system can find itself doing nothing most of the time, waiting on network traffic.

 

And as for non-spatial queries, those will be supported by our future products. SpaceBase only runs spatial queries. I am very much interested to hear what would be the concurrency challenges for that domain.

 

The concurrency challenges are whatever your system makes them. smile.png On a typical sharded system you could implement global chat by simply looping through every entity and sending it a message. A dynamically load-balanced system could broadcast to each of the servers which can perform the previous broadcast step internally. These are cheap operations on these systems. With your system, what happens if I want to run a query that returns every single entity of a certain type, regardless of distance? Is that going to harm performance, maybe shuffling every single entity to one node? Is there another way I can implement such behaviour?



#8 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 01 March 2013 - 06:39 AM

The question is really about what level in the application code the callbacks are. Here, we have callbacks right in the middle of the basic logic for an entity, and that is going to be very tricky for a developer to use. More common is to define the game logic more as atomic blocks of code that can immediately operate on the environment they are given.

 

That's a valid point. We're working on an API that transforms the callbacks into continuations that would appear single threaded to the programmer.

 

And I'd want to see exactly in the code how it is implemented, how visible that safety is, and how much it differs from a simple sharded solution. Because all that will tell me whether writing my game using your system is going to be much harder than the alternatives or not.

 

Future blog posts will address that. In the meantime, hopefully the user guide and demo code will help, and you can ask us more about it on our new forum.

 

Project Darkstar had the notorious problem of trying something very similar to what you're doing, except that it didn't actually scale when distributed across the network.

 

Project Darkstar was shutdown before reaching the point of a multi-node distribution. 

 

With your system, what happens if I want to run a query that returns every single entity of a certain type, regardless of distance? Is that going to harm performance, maybe shuffling every single entity to one node? Is there another way I can implement such behaviour?

 

SpaceBase only supports spatial queries. Future products will be built for other kinds of queries as well. You can be sure we won't recommend them before they're good enough for military use, let alone games.

 

the cards are already stacked against you with the Java background and the emphasis on spatial querying (which game developers typically view as incidental).

 

Ah, well, unfortunately you are right about that. I hope I will not be offending to many people here by saying this: we've come to realize game developers are the most conservative class of developers in existence. SpaceBase grew out of a project developed for the Israeli Air Force for a mission-critical, real-time C4I system. We're talking to people developing hard real-time missile interception systems, and they are much more open to innovation than game developers. Fact is, companies like Boeing are doing on-board avionics on the JVM, while some game developers are still convinced that they can produce better assembly code than a compiler or a JIT. Some have only recently come to terms with the benefits of an operating system :)

 

What we've seen is this: game developers' culture was formed at a time when every last inch of performance had to be squeezed out of old-gen CPUs in order to render impressive graphics. Now, game devs have quite mastered the GPU, and they think their games are fast because of the impressive graphics they've been able to achieve on the GPU. But the GPU is very much like the old CPUs: it's extremely simple. CPUs, with their pipelines, instruction-level parallelism, multiple pending cache misses and instruction re-ordering are extremely complex. There is no more "close to metal" when it comes to the CPU. Even the metal is more akin to software than to actual metal. But while other disciplines have come to rely on JIT and middleware, game developers are suspicious of both. 

 

For some reason, when you show a new technology to someone at, say, Twitter, they'll immediately start thinking how they can take advantage of it. When you show it to a game developer, he'll say, "where's the demo", and when you show him a demo he'll say, "well, that doesn't quite do what my game does". It's not a matter of games being more challenging. It's about game developer culture of suspicion with a super-heavy dose of NIH syndrome. 

 

The sad result is, and I know it will sound controversial, that game performance now lags behind the performance of apps in most other disciplines. It just doesn't appear that way because the graphics are so fast.

 

And this is why we've come to realize that convincing a game developer of anything is an exercise in futility. The truly challenging stuff takes place in the military and augmented reality communities, while games are about a decade behind state-of-the art. Like I said, it's a cultural thing, and there's little we can do to change that. Just about the only way to convince game developers to adopt a new technology is to not let them know it can be beneficial to games and let them think they've secretly uncovered some arcane knowledge.

 

That's the end of my rant. All I can do is put our stuff out there and if some adventurous game developer wants to try it, they'll see how helpful it is and they'll build a better game. If not - well, we make our money from other industries. You can only do so much to argue the theoretical merit of some technology or approach over another. There are always arguments and counterarguments. There's a point in the discussion when the developer will have to get his or her hands dirty and try the new technology out. Only then can the discussion continue.

 

I truly admire the game development community, and I remember the time when it was at the forefront of software development and showed others the way. I hope it gets to be that way again. In the meantime, I would really enjoy hearing your suggestions, reservations and questions and will try to address them, but in order to make the discussion fruitful - you're just going to have to try the stuff first. 



#9 starbasecitadel   Members   -  Reputation: 699

Like
0Likes
Like

Posted 01 March 2013 - 06:52 AM

I'll be interested in hearing further about results as your research continues, great work! 

 

Just a couple of notes:

 

- in terms of measuring scalability, instead of using the 2 variables as execution time and CPU,  I would instead recommend execution time and # of spaceships

 

- I've been using Google's Go language, and so far having tremendously good results.  To me, the language seems very well designed to be both simple in terms of concurrency programming but also extremely efficient.   Behind the scenes, it uses thousands of lightweight "microthreads" multiplexed onto a few actual threads, with the complexity hidden to the programmer, and for the most part it "just works".   

 



#10 Nercury   Crossbones+   -  Reputation: 770

Like
0Likes
Like

Posted 01 March 2013 - 09:35 AM

Ah, well, unfortunately you are right about that. I hope I will not be offending to many people here by saying this: we've come to realize game developers are the most conservative class of developers in existence. SpaceBase grew out of a project developed for the Israeli Air Force for a mission-critical, real-time C4I system. We're talking to people developing hard real-time missile interception systems, and they are much more open to innovation than game developers. Fact is, companies like Boeing are doing on-board avionics on the JVM, while some game developers are still convinced that they can produce better assembly code than a compiler or a JIT. Some have only recently come to terms with the benefits of an operating system

 

A lot of generalizations. I can tell another anecdote: I am newbie in game development, and I don't find your technology convincing. What does that prove? Nothing.

 

What we've seen is this: game developers' culture was formed at a time when every last inch of performance had to be squeezed out of old-gen CPUs in order to render impressive graphics. Now, game devs have quite mastered the GPU, and they think their games are fast because of the impressive graphics they've been able to achieve on the GPU.

 

Indeed, the difference between CPU and GPU usage for graphics is non-debatable. I see the difference in both ease of development and performance advantages.

 

But the GPU is very much like the old CPUs: it's extremely simple.

 

Programming interface is simple because you are constrained to do and care about one thing at the time. You have strict constraints what you can do and where. GPU itself... ugh. How many APIs and extensions it must support? I would not call it "simple".

 

CPUs, with their pipelines, instruction-level parallelism, multiple pending cache misses and instruction re-ordering are extremely complex. There is no more "close to metal" when it comes to the CPU. Even the metal is more akin to software than to actual metal. But while other disciplines have come to rely on JIT and middleware, game developers are suspicious of both. 

 

I don't follow. All the advanced things that speed up execution of instructions are not slower than not having them. Having cache is faster than not having it. JIT does not magically make cache not miss. I am not suspicious of JIT, I love it. But lower level problems do not disappear with JIT. This is a non-point.

 

For some reason, when you show a new technology to someone at, say, Twitter, they'll immediately start thinking how they can take advantage of it. When you show it to a game developer, he'll say, "where's the demo", and when you show him a demo he'll say, "well, that doesn't quite do what my game does". It's not a matter of games being more challenging. It's about game developer culture of suspicion with a super-heavy dose of NIH syndrome. 

 

Really. Because we write physics engines, audio libraries and graphics drivers ourselves. That was sarcasm...

Yeah, you need to find target audience and convince them. It just so happens that we care about our precious loop and weight every little thing we add to it.

 

The sad result is, and I know it will sound controversial, that game performance now lags behind the performance of apps in most other disciplines. It just doesn't appear that way because the graphics are so fast.

 

 

Wow, nice assertion.

 

And this is why we've come to realize that convincing a game developer of anything is an exercise in futility. The truly challenging stuff takes place in the military and augmented reality communities, while games are about a decade behind state-of-the art. Like I said, it's a cultural thing, and there's little we can do to change that. Just about the only way to convince game developers to adopt a new technology is to not let them know it can be beneficial to games and let them think they've secretly uncovered some arcane knowledge.

 

Or you can use your tech and make an unrivaled gaming experience. Convince + Profit. Win Win.

 

That's the end of my rant. All I can do is put our stuff out there and if some adventurous game developer wants to try it, they'll see how helpful it is and they'll build a better game. If not - well, we make our money from other industries. You can only do so much to argue the theoretical merit of some technology or approach over another. There are always arguments and counterarguments. There's a point in the discussion when the developer will have to get his or her hands dirty and try the new technology out. Only then can the discussion continue.

 

Ok - seriously: I may try it, I see no need for me right now. I like access time of objects in my stack and heap. However, if I ever wanted to run my single game world on multiple servers, I would need a way to replicate data between them. Then I would research various approaches more fully. There are probably only few developers in this forum who may be interested in this, and they are most likely working in larger companies, with several people already dedicated to solving similar problems. I don't think they haven't found efficient solutions to much more complicated tasks.

 

I truly admire the game development community, and I remember the time when it was at the forefront of software development and showed others the way. I hope it gets to be that way again. In the meantime, I would really enjoy hearing your suggestions, reservations and questions and will try to address them, but in order to make the discussion fruitful - you're just going to have to try the stuff first. 

 

I will want what you have then you show that the thing you have has value. Sell it to someone :D. Demonstrate that you can reap the benefits. It draws attention well :).

I will take another read over your article and maybe more time to study the code. I will post more constructive comments then.



#11 Hodgman   Moderators   -  Reputation: 31131

Like
0Likes
Like

Posted 01 March 2013 - 10:37 AM

This is basically just data-oriented-design, right -- having "the database" optimized for memory accesses, and also having the logic broken down into a series of parts that can be automatically scheduled across n-cores? This has been all the rage at the bleeding edge of games for the past 5 years... This isn't a new way of thinking... to get decent performance on the PS3 (which has a NUMA co-CPU), you pretty much have to adopt this way of thinking about code.

There's been plenty of presentations by game developers trying to get these ideas to filter down by boasting about their 25x speed boosts gained from this approach, etc...

 

So, your rant about game-devs might be accurate in some quarters, but is very misplaced in others. Your rants against these generic game developers just makes you seem insulting, defensive and unapproachable, sorry tongue.png

 

It also seems misplaced when your article is about running a sim of just 10k entities at 5fps...

 

What we've seen is this: game developers' culture was formed at a time when every last inch of performance had to be squeezed out of old-gen CPUs in order to render impressive graphics. Now, game devs have quite mastered the GPU, and they think their games are fast because of the impressive graphics they've been able to achieve on the GPU. But the GPU is very much like the old CPUs: it's extremely simple. CPUs, with their pipelines, instruction-level parallelism, multiple pending cache misses and instruction re-ordering are extremely complex. There is no more "close to metal" when it comes to the CPU. Even the metal is more akin to software than to actual metal. But while other disciplines have come to rely on JIT and middleware, game developers are suspicious of both. 

You need to look into how modern GPUs really function these days. All of those complications are present, and are much more complex due to the massive amounts of data that's in the pipeline at one time, and the massive amounts of latency that have to be hidden...

 

P.S. All our GPU code is JIT compiled, I would JIT my gameplay code on consoles if I was allowed (it is JITed on PC), and almost every game is built using licensed middleware (Unity, Unreal, etc...)

 

P.P.S. I've worked on military contracts with Boeing and the Australian Air Force, and I was not impressed with their attitudes towards efficiency. Too much bureaucracy and secrecy caused huge barriers between different parts of the implementation, necessitating huge amounts of hardware. We ended up being given a whole PC inside the simulator just to simulate a few of the cockpit displays (which required reverse engineering and re-implementing the same landscape rendering engine from a different PC that we weren't allowed to access) and to send those images to the master PC. Who needs to be efficient when you can just throw more hardware at the problem...


Edited by Hodgman, 01 March 2013 - 08:26 PM.


#12 hplus0603   Moderators   -  Reputation: 5538

Like
1Likes
Like

Posted 01 March 2013 - 01:30 PM

hard real-time missile interception systems

I think the "conservatism" may be something else. After all, game developers are the first to use the latest and greatest graphics, computing, and input hardware innovations!
In the case of missile intercepts, it would be worth it spending one full computer per missile you plan on intercepting.
Game developers have to serve many thousands of users per computer, or the costs will drive them out of business!

To put a different spin on your thesis: Distributed computing solve approximately three problems:
1) How do I run different parts of the computation in different locations while keeping a cohesive whole? (messaging)
2) How do I move computational parts between systems when needed? (partitioning)
3) How do I decide which computational parts to put where? (load balancing, optimization)

For example, in a system like HLA, each computational node "inserts" itself into a subrange along any number of dimensions, and receives messages within that subrange. The computation is locked to a node, but the subrange of dimensions it interacts between may vary.

In a traditional "seamless world" game system, nodes insert themselves at mostly fixed geographical locations, and the computation for entities moves to the server that is "owner" of that particular piece of the world.

I think there's value in considering many different approaches to how to do load balancing, but saying that servers have to be written "on the inside" of a specific server framework will very much limit the appeal of the solution. A solution that works as a library/subsystem and lets you (the developer) "own" the rest is much more attractive.
enum Bool { True, False, FileNotFound };

#13 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 02 March 2013 - 01:22 PM

I've been using Google's Go language, and so far having tremendously good results.  To me, the language seems very well designed to be both simple in terms of concurrency programming but also extremely efficient.   Behind the scenes, it uses thousands of lightweight "microthreads" multiplexed onto a few actual threads, with the complexity hidden to the programmer, and for the most part it "just works".   

 

Thats right. Erlang and Clojure have similar constructs as well, and Clojure is generally faster than Go, if the code is optimized for performance. Both of these languages also have protections from data races that are absent in Go, but Go is great.

 

How many APIs and extensions it must support? I would not call it "simple".

 

I meant to say that a GPU core is simple in the sense that the time it takes to execute a given piece of code is simply related to the number of instructions. This is far from being the case with modern CPUs.

 

All the advanced things that speed up execution of instructions are not slower than not having them. Having cache is faster than not having it. JIT does not magically make cache not miss.

 

It's not slower, quite the opposite. But taking full advantage of it in a large piece of software is no longer achievable by handwriting assembly. There are too many things to take into consideration, and it's better to put our faith in the JIT or the optimizing compiler.

 

So, your rant about game-devs might be accurate in some quarters, but is very misplaced in others.

 

Yeah, sorry. I may have been carried away.

 

P.S. All our GPU code is JIT compiled

 

Cool. What do you use to do it?

 

A solution that works as a library/subsystem and lets you (the developer) "own" the rest is much more attractive.

 

This is generally true until you hit a wall. You run your program "on the inside" of the operating system and let it schedule threads for you because scheduling is usually best left to the framework. It's the same for micro-threaded environments like Erlang and Go. Because the main contribution of this approach is letting the "database" take charge of scheduling for the purpose of multicore utilization, I don't quite see how it can be implemented as simple functions. A scheduler, by definition, calls your code and not the other way around.

 

I think the "conservatism" may be something else. After all, game developers are the first to use the latest and greatest graphics, computing, and input hardware innovations!
...
Game developers have to serve many thousands of users per computer, or the costs will drive them out of business!

 

And yet, when it comes to utilizing multi- and many-core hardware, games are years behind. I've spoken to a senior developer at a very well-known AAA MMO studio, who told me they don't allow their developers to write multi-threaded code. Another architect at a different MMO studio told me that only now they're realizing that they're going to have to abandon the serial game-loop way of thinking, at this is very hard for them to accept. 

 

I think that game programmers, unlike their peers in other industries, still try to hold on to the illusion that by somehow being "close-to-metal" (which is an illusion in and of itself, IMHO) and controlling everything they would somehow utilize hardware better. I believe this is a direct result of what you were saying: game developers always needed to make the most out of a given fixed of hardware. But while in the past this could have only been achieved by controlling the entire software stack, today the opposite holds: to utilize modern hardware you must relinquish control.

 

In a traditional "seamless world" game system, nodes insert themselves at mostly fixed geographical locations, and the computation for entities moves to the server that is "owner" of that particular piece of the world.

 

Absolutely. I so other way to reduce communication. You just have to make sure that a node isn't overloaded when too much action is concentrated in the area it owns. The right kind of data structure would automatically take care of this kind of load balancing.

 

The trick to good concurrency is the realization that interesting concurrency must manage contention, and contention equals communication, and communication is slow. Disallowing contention altogether severely limits the application, but a good data structure can minimize communication. I did some calculations here (I may have posted a link to that here before).



#14 hplus0603   Moderators   -  Reputation: 5538

Like
0Likes
Like

Posted 02 March 2013 - 01:46 PM

A scheduler, by definition, calls your code and not the other way around.

Agreed. But to the programmer, that detail should be hidden. When I'm used to using C with stdlib, or C++ with STL and Boost, or Java with javalibs, I don't need to re-write that depending on whether I run on the inside of the Linux kernel, or the Windows NT kernel, or the QNX kernel, or whatever. The actual API to interact with the scheduler is tiny -- threads and mutexes.

Similarly, the API to interact with your database-partitioned-scheduler should be tiny. The more you can make it look like threads-and-mutexes, or other forms of I/O, and the narrower you can make the interface, the more likely you will be to gain adoption.

game developers don't let developers write multi-threaded code

This is likely for a good reason: They've found that doing so causes bugs that cost schedule time and cause user experience problems, and are very hard to debug. That's a rational response. Another rational response might be to improve working conditions and the attractiveness of the job in general, so you'd attract developers who currently wouldn't consider the games industry. Another rational response might be to adopt various libraries that let them harness multi-threaded or distributed development with minimal friction -- that's where your job lies: Prove that you can deliver low-friction distributed systems development! The approach you've described so far sounds to me like "high friction" but I don't think that would be impossible to fix.
enum Bool { True, False, FileNotFound };

#15 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 02 March 2013 - 05:31 PM

The approach you've described so far sounds to me like "high friction" but I don't think that would be impossible to fix.

 

This actually sounds like some very constructive criticism. Thank you, and we'll get to work on that!

 

Naturally, many new products are a bit rough around the edges at first, though I don't think it's that high friction. You issue a query, you get the results passed callback, and within that callback the results are locked (for reading or writing, depending on the type of the query). You process the results in the callback and issue further queries if necessary. 

But I agree it's bit cumbersome, especially in Java/C++. We will be integrating a continuations library that will make the callback implicit and the whole thing look more single-threaded. Hopefully this will help reduce the friction...



#16 hplus0603   Moderators   -  Reputation: 5538

Like
0Likes
Like

Posted 02 March 2013 - 09:45 PM

We will be integrating a continuations library that will make the callback implicit and the whole thing look more single-threaded. Hopefully this will help reduce the friction...

Or it will cause mysterious threading bugs, especially in the case of cancellations :-) Continuations are threads, and users sometimes forget that. And the cancellation case is usually pretty hard to get right.

As long as you're OK with requiring to make all code available to run in all locations at the same version, you could perhaps hide your database distribution behind some messaging framework. Make it look a bit like MPS, perhaps? You can perhaps expose the database as a way to send messages to entities. Entities register in the domains they get load balanced across. The trick is going to be how to "invisibly" take the desire to create an entity with the address tuple A,B,C and make it actually be instantiated on the right node... All with a super small API :-)
enum Bool { True, False, FileNotFound };

#17 Hodgman   Moderators   -  Reputation: 31131

Like
0Likes
Like

Posted 02 March 2013 - 10:49 PM

And yet, when it comes to utilizing multi- and many-core hardware, games are years behind. I've spoken to a senior developer at a very well-known AAA MMO studio, who told me they don't allow their developers to write multi-threaded code. Another architect at a different MMO studio told me that only now they're realizing that they're going to have to abandon the serial game-loop way of thinking, at this is very hard for them to accept.

Again, this very much depends on who you talk to. You'll get a lot of the same responses in other segments of the software industry too.
In most places that I've worked, yes people are discouraged/banned from writing multi-threaded code based on shared-state concurrency, because it is notoriously difficult. Even veteran engineers make mistakes in this paradigm from time to time, as it's notoriously difficult to prove that your code doesn't have any race conditions or locking/scheduling problems.
In many courses, this particular paradigm is the only method of multi-threading that's taught -- so when you interview a graduate and ask how they'd take advantage of a multi-core games console, they almost always talk about mutexes!
Often, shared-state code like this seems to be correct, and may even pass it's unit test successfully for months or years, before failing randomly and unreproducibly. Many old games have multi-threading bugs that don't manifest on single-core CPU's, but do manifest on true SMP hardware -- e.g. to play Sim City 4 without random crashes, I have to go into the Windows task manager and change it's thread affinity to restrict it to one core! To avoid this, it's better to simply avoid this unreliable paradigm as much as possible.

Other paradigms, like the one you've presented, aren't as easy to totally screw up. Message passing, the Actor Model, functional style programming, and automatic decomposition of serial code into a directed-acyclic-graph of task nodes, are used.
 
For example, the "serial game loop" can be written as high level Lua code. Whenever it makes a function call to the underlying engine, this call isn't executed immediately, instead it's queued and a future is returned. At the end of the Lua code, all these queued functions and futures can have their dependencies determined (by which functions have which futures passed in as arguments) and can be converted into a DAG. The DAG can then be linearized into a schedule of individual tasks and fences, spread across many cores.
This lets your high-level programmers continue to write seemingly serial code, which is automatically parallelized, while a few veterans of shared-state concurrency maintain the scheduling logic.

As mentioned before, the PS3 has a slow GPU and a slow/simple (in-order, etc) single-core CPU (overview). All of it's computing power is in the many-core NUMA co-CPU. This means that every PS3 game of any complexity at all, is pretty much guaranteed to contain a damn lot of parallel code (and not the shared-state kind, as that does not perform well at all under NUMA).

The PS3 with it's many-core hybrid CPU, and the 360 with it's hyperthreaded tri-core CPU have been out for over half a decade, so any game developer working on consoles games should definitely had the multi-core lesson sink in by now! Failing to write multi-core capable code has the effect of reducing the power of these consoles by a huge factor (~3-6x), so it's safe to assume people who are generalized as being obsessed with squeezing out ever last drop of performance from a bit of hardware would be investigating their options here...
 

P.S. All our GPU code is JIT compiled

Cool. What do you use to do it?

Sorry, I meant that everyone's GPU code is JIT compiled.
You control the GPU via an API like D3D, and shader code (e.g. HLSL) which is compiled into an intermediate bytecode format.
The driver then takes the command queue that's produced from D3D, and your shader byte-code, and JIT compiles these into actual proprietary GPU instructions as necessary.


Edited by Hodgman, 02 March 2013 - 10:59 PM.


#18 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 03 March 2013 - 03:15 AM

Or it will cause mysterious threading bugs, especially in the case of cancellations :-) Continuations are threads, and users sometimes forget that. And the cancellation case is usually pretty hard to get right.

 

Well, I stand behind my claim that nowadays performance is gained by relinquishing control. At the end of the day you'll have to trust the framework to do the right thing. We take care of the edge cases for you.

 

As long as you're OK with requiring to make all code available to run in all locations at the same version, you could perhaps hide your database distribution behind some messaging framework. Make it look a bit like MPS, perhaps? You can perhaps expose the database as a way to send messages to entities.

 

What you describe sounds a whole lot like our open-source in-memory data grid, Galaxy, that serves as our low-level distribution layer. 

 

In most places that I've worked, yes people are discouraged/banned from writing multi-threaded code based on shared-state concurrency, because it is notoriously difficult. Even veteran engineers make mistakes in this paradigm from time to time, as it's notoriously difficult to prove that your code doesn't have any race conditions or locking/scheduling problems.

 

Except if you use a transactional framework that takes care of all that for you.

 

Every shared-state concurrency/distribution (we can agree it's the same problem) is an abstraction on top of message passing, that translates contention to messaging. In the CPU, that messaging layer is called cache-coherence, and uses variants of the MESI messaging protocol. Now there are two questions to be asked: 1) is the shared-state abstraction useful (or not, or maybe even harmful), and 2) what is the impact on performance?

 

As for 1), I'd claim that's it's not only useful, but often necessary, and, furthermore, often messaging architectures re-implement shared state, only they do it in an error-prone manner. If, for example, you have a seamless world, you do wish to model a shared state (your world). You could implement the shared state on top of messaging yourself, or use a pre-packaged abstraction. The reason shared state causes more trouble "out-of-the-box" than messaging is that messaging makes communication explicit, while shared state hides it. But, if you could use a shared state abstraction that takes care of races and deadlocks for you, then this problem, at least, is solved. You might be also persuaded that shared-state is useful by the fact that two of the best known actor systems, namely Erlang and Akka, both provide shared state in addition to actors. Erlang has ETS (and Mnesia), and Akka has limited STM. The designers of both platforms have realized that message passing alone is insufficient (or degrades at times to hand-rolled implementation of shared-state on top of it). Also, they implement shared state at a lower level (at least on a single machine) than their messaging API so it performs better.

 

And as for 2), I'd say this. Many transactional frameworks (databases, STM) turn a correctness problem into a performance problem. They guarantee correctness but at a high performance cost because they try to be general and address every possible data-access pattern. The aforementioned Project Darkstar took that same path. We, however, take a different path. We do not provide general transactions but only transactions that are likely within some domain. Our first database/scheduling framework, SpaceBase, assumes a spatial object domain, and allows transactions that span some contiguous region of space. This assumption does not only make internal scheduling decisions simpler (for example, we make sure that we never have to rollback conflicting transactions), it also gives tremendous performance (or scaling, rather), far better than you would have been able to do by re-implementing all of this yourself on top of messaging (at least not without a lot of effort).

 

The reason for that is that the framework/database is able to organize the objects in such a way as to minimize contention and therefore messaging. 

 

and not the shared-state kind, as that does not perform well at all under NUMA

 

Again, shared-state is simply an abstraction on top of message passing. NUMA's cross-node messages are relatively slow , so the aim should be to reduce the number of such messages regardless of the abstraction you use. I claim that the database can do a fine job at reducing messaging, and I've proved that in this highscalability article. The idea is that if you use the right kind of data structure (a b-tree in the article), communication is minimized. 

 

All modern computer systems are organized as layers of fast-access caches communicating among them via some slow communication buses. The same problem, BTW, applies not only to NUMA but to CPU cores at a lower level, and to cluster nodes at a higher level. A b-tree data structure, like that in the example, makes contention at higher-level tree nodes exponentially less likely than contention at lower nodes, and contention at higher nodes is exponentially more expensive (slower comm channel). The two cancel each other out exactly, making each operation O(1) with a very, very low constant (<< 1) in terms of cross-node communication, no matter the "layer" you're at (CPU cache, NUMA or cluster). 

 

Sorry, I meant that everyone's GPU code is JIT compiled.
You control the GPU via an API like D3D, and shader code (e.g. HLSL) which is compiled into an intermediate bytecode format.

 

Of course. That's still an AOT compiler, not a JIT, as it compiles the shader before it runs for the first time. I was referring to the merits of a true profiling JIT, one that performs optimizations based on the running profile of the code. I thought that somehow you've done that with shaders.



#19 hplus0603   Moderators   -  Reputation: 5538

Like
0Likes
Like

Posted 03 March 2013 - 08:25 PM

Every shared-state concurrency/distribution (we can agree it's the same problem)

I disagree! In my view, shared-state concurrency is very different from a cohesive distributed view. Being able to be divergent but eventually consistent on disparate nodes is a hugely important distinction that you don't have in MESI.

Also, Galaxy: I remember you posted about that a while back, and I had about the same thing to say about itthen: MESI over a network is probably solving the wrong problem.
enum Bool { True, False, FileNotFound };

#20 Hodgman   Moderators   -  Reputation: 31131

Like
0Likes
Like

Posted 03 March 2013 - 10:20 PM

Well, I stand behind my claim that nowadays performance is gained by relinquishing control. At the end of the day you'll have to trust the framework to do the right thing. We take care of the edge cases for you.

At the end of the day, all that any program does is transform some data into some other data, by a sequence of processes.

Given enough time, money, talent and patience, it will always be possible to replicate the results of some framework by instead writing everything by hand, even in raw asm etc... The reason this isn't feasible is that the time/money/talent/patience costs are insanely high!
Choosing to use a framework/library/middleware, as always, should be based on a cost-benefit analysis -- e.g. can we get the same performance ourselves, given the same amount of time/money and our current staff?
Often the answer to that is "no", which is why middleware is so damn popular in the games industry (contrary to your assertion). Games can be written from scratch (and sometiemes you might have better performance if they were), but it's usually cheaper/faster to just license some middleware that reduces your production time by an order of magnitude.

Except if you use a transactional framework that takes care of all that for you.

That's exactly my point -- writing shared-state multi-threaded code without a framework is difficult and dangerous, and hence discouraged. That's why game developers do use concurrency frameworks, and discourage the "manual authoring" of multi-threaded code outside of one (i.e. we don't like to write multi-threaded code by hand).

That said, transactional frameworks are largely rubbish for many kinds of games, as they often make the decision to give up determinism, which is essential for some games. Typical shared-state designs revolving around mutexes are often the same, with execution speed of different threads determining the order in which simulation operations occur.

On that note, with your space-ship example, it's not obvious what kind of ordering constraints are/can-be applied to the different processes. Are all beams fired before any damages are calculated, etc?
In the Actor Model implementation that I've used for games, these kinds of ordering issues were explicit to the programmer, allowing them to maintain a deterministic simulation, regardless of the number of cores or the scheduling algorithm used. e.g. An actor could specify that within a particular simulation frame, it would process all "FireBeam" messages before it processed any "TakeDamage" messages.

As above, I don't know of any big games using anything like STM, instead most use something closer to a stream-processing or flow-based programming model, which is easy to reason about (you can write the main flow in a single-threaded style and have it be automatically decomposed), has few performance pitfalls (no messy rewinding, can be well optimized for data access patterns), is 100% deterministic, and allows for huge data-parallelism and task-parallelism.
STM-like techniques are usually only an internal implementation detail, e.g. the algorithm to push an item into a lock-free queue may be transactional, repeatedly trying again and again until it succeeds without conflict. This queue would then be used to implement a more useful concurrency model.

Every shared-state concurrency/distribution (we can agree it's the same problem) is an abstraction on top of message passing, that translates contention to messaging. In the CPU, that messaging layer is called cache-coherence, and uses variants of the MESI messaging protocol.

I wouldn't agree with that. Multiple threads sharing the same RAM without restriction gives us the "shared state" model of concurrency. When we build a message passing framework, we build it directly on top of this model.
Underneath this model, there is the hardware implementation, which in some cases, yes, uses MESI/etc, which gives us the layers:
1) Message passing between caches.
2) Shared RAM.
3) High level Messages.
The high level message passing layer is still built on top of the shared-state layer, regardless of how layer #1 operates. #1 is just a black box in order to make layer #2 function correctly.

The type of message passing going on in layer #3 is semantically different to layer #1.

Also, in the example of the PS3 that I was using, the SPU co-processors don't have a cache at all, so obviously don't implement MESI. They have their own local-RAM, and can initiate transfers to/from the main shared RAM and their local-RAM (and two concurrent but overlapping transfers will have indeterminable results).
You could argue that these transfers are "messages" and therefore it's built upon message passing, but if you go that far, then "shared-state" ceases to exist at all: interacting with the global state is just "sending a message"! ...and that doesn't seem like a useful thought construct.

Edited by Hodgman, 03 March 2013 - 10:51 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS