Advertisement Jump to content
  • Advertisement

Matias Goldberg

  • Content Count

  • Joined

  • Last visited

Community Reputation

9623 Excellent

1 Follower

About Matias Goldberg

  • Rank

Personal Information


  • Twitter

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. If we travel back in time to 2010, there wasn't much to chose. Unity was paid and starting to get known, an Unreal Engine license costed tens of thousands of dollars. I don't remember if CryEngine was open to licensing. So the question would have been unequivocally "use Ogre3D or roll your own engine" for games. We did lose a lot in terms of community since lots of users started to migrate to all these new options. But the small community we still have usually responds quickly at least when it comes to 2.1, and I usually respond quickly as well. You still get support if that's what you're asking. But it's a far cry from our very active community 8 years ago, and a big community meant your obscure questions were likely to be answered (like "I can't get Ogre to run in a custom Arduino") which most of us cannot or have little experience. As for being an Ogre old timer: If you were familiar with Ogre 1.4/1.6/1.7/1.8/1.9 then we have a porting manual for you (online manual, more up to date; but the porting part hasn't changed). The recommended flow for old timers porting is Ogre 1.x -> 2.0 -> 2.1 Ogre 2.0 implemented some architectural changes to core (make SceneNode traversal cache friendly, SIMD and threadable) plus a new Compositor architecture (whose scripts are similar in syntax, thus relatively easy to port). In fact the samples from 2.0 are the same as 1.9; Thus Ogre 2.0 still feels very 1.4-1.9. Note that 2.0 is not actively maintained, and as the link explain, is a good "middle step" to move to 2.1 Ogre 2.1 added architectural changes to the renderer and material system. If you've used Unity or UE4, you'll feel the material system very familiar because "it just works" by defining PBR properties such as roughness, diffuse, fresnel etc. Which is very different from 1.x's materials (now called "low level materials" in 2.1; which are still there but mostly used for compositor effects, and are not recommended for scene objects) which required you to manually specify a vertex & pixel shader. So by trying them in sequence 1.x -> 2.0 -> 2.1; you can see the changes progressively. Going from 1.x -> 2.1 is possible but feels more shocking. Many concepts are still there: The ResourceGroupManager hasn't changed, the new "Item" that replaces "Entity" (now v1::Entity) work similarly (they still need to be attached a node, most functions have the same name, etc). Perhaps what's most confusing is that we call "HlmsDatablock" or simply "datablock" what you would normally call a "Material", because the name was taken by the old materials. Does that answer your question?
  2. Hi! I'm an Ogre3D dev. There are many things that can be said. As you said, Ogre3D is a rendering engine rather than a game engine. Although it's fair to say that compared to other rendering frameworks (e.g. bgfx, The Forge / ConfettiFX, D3D12 Mini Engine) we do a lot more, putting us closer to game engine, but not quite there. We don't do physics, sounds or networking. It's great if you want to glue different components together (or write your own), but not so much if you just want to start making a game out of the box ASAP. That's where Unity, UE4, Godot, Wicked, and Skyline shine as game engines. Ogre stays strong in non-games applications, such as simulation and architecture SW. As to development: Ogre has two main branches: +2.1 and 1.x I maintain +2.1 while Pavel maintains the 1.x one. Ogre 2.1 isn't WIP. In fact there are games released using it such as Racecraft and Sunset Rangers. Skyline Game Engine is built on top of Ogre 2.1 The main reason you haven't seen an official release is because our CMake scripts that generate the SDKs are ancient and the SDK they generate do not match the folder structure from an out-of-source build. This problem isn't new, it affects Ogre 1.x too; but it is too bothersome and there's a lot of CMake ancient legacy to sort out. Previously another reason was that we were waiting for GLES3 support (to get Android support for 2.1), but it became less of a priority given Android's extremely poor driver quality. But that doesn't mean Ogre is WIP. The main highlights from Ogre 2.1: Serious boost in performance. Ogre 1.9 used to be the slowest alternative. Pavel has been doing a lot of good work to improve that in 1.11; but in Ogre 2.1 we made architecture changes to address the problem from the root. It's common to hear 4x improvements when migrating to 2.1. When it comes to performance nowadays we do very well against the competition in this front. We seem to be popular for VR simulations thanks to that (but still take in mind VR requires rendering at 90fps in two eyes, and that is hard whatever engine you pick!). PBR material & pipeline Hlms materials are friendlier as we handle all shader work for you (unless you specifically don't want to), unlike 1.9 where you had to either write your own shaders or setup the RTSS component. There are many more differences but I don't want to clutter this forum thread and turn it into a Changelog With OpenGL you can target Linux (NVIDIA & Mesa-based drivers), with GL & D3D11 you can target Windows (minimum GPU: D3D10 hardware), with Metal you can target iOS, and with Metal you can target new Macs while you can use GL to target older Macs (our GL for Mac is Beta, not to mention Apple itself is deprecating it). We want to target Android and community member did an excellent job, but the poor driver quality continues to be an issue. We get deadlocks inside the driver while compiling shaders, incorrect results when using shadow mapping, crashes while generating mipmaps. We may address via workarounds for Ogre 2.2; or maybe through Vulkan. But at the current time, if you plan on targeting Android, you'll have to use Ogre 1.11 or something else. Of course it's not as easy as Unity where targetting another platform is just one click away. The rest of your application code has to be able to run on those platforms as well and deal with the differences. Ultimately it depends on what you want to deal with. Personally and most of the people I work with like to write custom stuff, specific for our needs. That gives us flexibility, better framerates and distinct look. Or it's because we like to own our tech ("owning" as in if something is broken we can fix it ourselves because we've got source code access, or we can write our own alternative, or we can swap another solution). UE4 gives you source level access (though it's a huge codebase) but it's not technically free. The standard Unity licenses doesn't give you source level access. It may be fine for you, but then if a feature you need is broken, you have to patiently wait for the next release for a fix (if it gets fixed, and pray the upgrade doesn't break your game), but I admit is the one with most user friendly interface. Godot is lighter and open source, so that engine would be my decision if I'd go for a game engine. Ogre3D gives you more control and power, and generally better performance which is something to take in mind if your game wants to have a lot of dynamic objects on screen (i.e. RTS games fit that description). But the downside is that you have to do a lot of tech work yourself. It's not "free" in that sense. Right now our development efforts are focused on Ogre 2.2 (which IS wip), and as our Progress Report from December 2017 says, it centers on a overhaul of texture code to heavily improve memory consumption and allow for background streaming (among other improvements). But we still add incremental features to 2.1; for example we've recently added approximate fake area lights (including textured area lights; the "fake" stands as in that the math is not PBR). Cheers
  3. Matias Goldberg

    HLSL's snorm and unorm

    UAVs require this. These modifiers are not meant for local variables thus it's very likely an unorm float local variable will just behave like a regular float. As to your out of range conversion, saturation happens. Snorm -1 becomes 0 when stored to an unorm buffer. 5.7 becomes 1.0 when stored to u/snorm, and -7 becomes -1 when stored to snorm. I don't remember about nans but I think they get covered to zero. Also watch out for -0 (negative zero) as it becomes +0 when stored as snorm
  4. Matias Goldberg

    User authentication without storing pw's

    That is because there is a high amount of high profile companies using extremely poor security practices. Yahoo! had their passwords stolen which were stored using MD5. LinkedIn was using unsalted SHA-1. They failed at basic security implementations that should be embarrassing given their size. Any strong password implementation should use salted bcrypt2 passwords, and the password exchange every time you login must happen adding extra randomized salts. Thus the passwords are never exchanged in plain text, the hashed password exchange is never the same twice (thus preventing spoofing) and if the passwords database gets stolen, it would take significant time (years) to crack them. The problem with emailing tokens is that emails are sent in plain text format and ping with a lot of servers. It's SO easy to steal them without you ever knowing (and without having to hack your email credentials in any way). I don't want this topic to become political, but that's why the investigation about Hillary Clinton is so important: she store confidential information in her private email account. Regardless of why she did it or whether she should have done that, doing that is extremely insecure. Basically, sending emails is is the equivalent of shouting extremely loud in public. Everyone knows. You can use public/private key encryption for emails, however that creates the problem of telling people to send you their public keys, which you'll have to keep in a database that can get stolen. Yes, stealing a public key is worthless. But I could argue so is stealing a salted bcrypt2 hashed password. Also if hackers manage to steal your private key, they can now impersonate you. Yeah, you can try to revoke your private key. But so can you ask the users to reset their passwords.
  5. The way I approached this was by conceptually splitting the textures into "data" and "metadata". Metadata is resolution, pixel format, type information (e.g. is this a cubemap texture?). Metadata is usually the first thing that gets loaded. So whenever I need it, I can call texture->waitForMetadata(); which will return when background thread gets that information. Or I can register a listener to get notified when metadata is ready. Of course management can get more complex: Do you want metadata to get prioritized? (RIP seek times) or are you willing to wait for 30 textures that come before to finish both their data and metadata before the texture you want gets to load the resolution from file? or maybe you want to design your code so that UI textures get loaded first? A complementary solution is keeping a metadata cache, that gets stored into disk so you can have this info available immediately on the next run. When the metadata cache becomes out of date, your manager updates the cache when it notices the loaded texture didn't match what the cache said. You could also build the cache offline. If you rely on the metadata never being wrong, then you need to write some sort of notification & handling system (tell everyone the data has changed and handle it, or abort the process, notify the user and run an offline cache tool that reparses the file; or look at timestamps to check what needs to be updated, etc)
  6. Matias Goldberg

    Semi Fixed TimeStep Question.

    Suppose you minimize the application so everything stops; 30 seconds later the application is restored. The counter will now read that you're lagging behind 30 seconds and needs to catch up. Without the std::min, and depending on how your code works, either you end up calling update( 30 seconds ) which will screw your physics completely; or you end up calling for( int i=0; i<30 seconds * 60fps; ++i )update( 1 / 60 ); which will cause your process to become unresponsive for some time while it simulates 30 seconds worth of physics at maximum CPU power in fast forward. Either way, it won't be pretty. The std::min there prevents these problems by limiting the time elapsed to some arbitrary value (MAX_DELTA_TIME) usually smaller than a second. In other words, pretend the last frame didn't take longer than MAX_DELTA_TIME, even if it took more. It's a safety measure. Edit: Just implement the loops talked by gafferongames in a simple sample, and play with whatever you don't understand. Take it out for example, and see what happens. That will be far more educational than what we can tell you.
  7. Matias Goldberg

    what is the appeal of fps games?

    Mass Effect was a shooter, but it wasn't a first person shooter. It's a Third Person Shooter, such as the likes of Tomb Rider, Just Cause, GTA, Hitman, etc. Not quite the same beast. Yup. It's good to point it out because the user base usually takes a great pride in the "realism" from such games (it's the users? the marketing? I don't know), when the reality is that "realism" isn't fun. Reality means a bullet travelling at super sonic speed you first get hit then hear the sound coming. A fun game means sound is simultaneous with impact so the player knows where the bullet is coming from in order to shoot back.
  8. Matias Goldberg

    Resolving MSAA - Depth Buffer to Non MSAA

    In my experience sampling from an MSAA surface can be very slow if your sampling pattern is too random (i.e. I attempted this at SSAO and SSR, didn't go well). I'm not sure why, whether it's because of cache effects or the GPU simply not being fast at it. But likely for your scenario that's not a problem. If you're doing this approach using a regular colour buffer (instead of a depth buffer) would be better. Depth buffers have additional baggage you don't need (Z compression, early Z) and will only waste you memory and cycles. Btw if you're going to be resolving the MSAA surface, remember an average of Z values isn't always the best idea. That won't work because the water still needs the depth buffer to avoid being rendering on top of opaque objects that should be obstructing the water. However as MJP pointed out, you can have the depth buffer bound and sample from it at the same time as long as it's marked read-only (IIRC only works on D3D10.1 HW onwards). This is probably the best option if all you'll be doing is measuring the depth of the ocean at the given pixel. Cheers
  9. Matias Goldberg

    Trying to finding bottlenecks in my renderer

    You're still not using the result. printf() rData[0] through rData[3] so it is used. And source your input from something unknown, like argv from main or by reading from a file; else the optimizer may perform the code at compile time and hardcode everything (since everything could be otherwise be resolved at compile time rather than calculating it at runtime). Sure. I can save you some time by telling you warp and wavefront are synonims. A warp is how NVIDIA marketing dept calls them, a wavefront is how AMD's marketing dept calls them. You can also check my Where do I start Graphics Programming post for resources. Particularly the "Very technical about GPUs." section. You may find the one that says "Latency hiding in GCN" relevant. As for bank conflicts... you got it quite right. Memory is subdivided into banks. If all threads in a wavefront access the same bank, all is ok. If each thread access a different bank, all is ok. But if some of the threads access the same bank, then things get slower. I was just trying to point out there's always a more efficient way to do things. But you're doing a videogame. At some point you have to say "STOP!" to yourself, or else you'll end up in an endless spiral of constantly reworking things out and never finishing your game. There's always a better way. You need to know when something is good enough. I suggest you go for the vertex shader option I gave you (inPos * worldMatrix[vertexId / 6u]). If that gives you enough performance for what you need, move on. Trying Compute Shader approaches also restricts your target (e.g. older GPUs won't be able to run it) which is a terrible idea for a 2D game.
  10. Matias Goldberg

    Trying to finding bottlenecks in my renderer

    I got confirmation from an AMD Driver engineer himself. Yes, it's true. However don't count on it. The driver can only merge your instancing into the same wavefront if several conditions are met. I don't know the exact conditions, but they're all HW limitation related. i.e. if the driver cannot 100% guarantee the GPU can always merge your instances without rendering artifacts, then it won't (even if it were completely safe given the data you're going to be feeding, but the driver doesn't know that a priori, or it would take a considerable amount of CPU cycles to determine so). When it comes to AMD, access to global memory may have channel and bank conflict issues. NVIDIA implements it as huge register file, so there's always a reason...
  11. Matias Goldberg

    Trying to finding bottlenecks in my renderer

    You don't have to issue one draw call per sprite/batch!!! Create a const buffer with the matrices and index them via SV_VertexID (assuming 6 vertices per sprite): cbuffer Matrices : register(b0) { float4x4 modelMatrix[1024]; //65536 bytes max CB size per buffer / 64 bytes per matrix = 1024 // Alternatively //float4x3 modelMatrix[1365]; //65536 bytes max CB size per buffer / 48 bytes per matrix = 1365.3333 }; uint idx = svVertexId / 6u; outVertex = mul( modelMatrix[idx], inVertex ); That means you need a DrawPrimitive call every 1024 sprites (or every 1365 sprites if you use affine matrices). You could make it just a single draw call by using a texture buffer instead (which doesn't have the 64kb limit). This will yield much better performance. Even then, it's not ideal, because accessing a different matrix every 6 threads in a wavefront will lead to bank conflicts. A more optimal path would be to update the vertices using a compute shader that processes all 6 vertices in the same thread, thus each thread in a wavefront will access a different bank (i.e. one thread per sprite). Instancing will not lead to good performance, as each sprite will very likely will be given its own wavefront unless you're lucky (on an AMD GPU, you'll be using 9.4% of processing capacity while the rest is wasted!) See Vertex Shader Tricks by Bill Bilodeau.
  12. Matias Goldberg

    Trying to finding bottlenecks in my renderer

    I skimmed through very long thread just to find only the last post (Mekamani's) point it out. You're multiplying each vertex against a matrix from your CPU, no threading. Taking 40ms for doing 40k matrix multiplications per frame for a single core sounds about correct. That's your problem.
  13. Matias Goldberg

    State of Custom and Commercial Engines as of 2017/2018

    If we go backwards in time, you'll find a lot of games were made in Unreal Engine 3. Nowadays there's more of UE4 games (specially indie) because the license price went down from >$50.000 to "free" until you sell enough then they get 5% cut out of gross sale. There were also a lot of games using RenderWare. The names have changed, but the practices haven't. Many games use canned engines, some games still use their own home-grown engine. It's just that games made with canned engines have a recognizable stamp on it, while home-made engines just won't mention it unless they're doing heavy marketing on that, or plan on selling the engine to others. You rarely hear that, for example, Divinity: Original Sin 1 & 2 were made with home-grown engines. The Witcher also uses in-house engine. Also having powerful engines like UE4 & Unity become "free" (they're not really free, but still easily accessible) rather than costing tens of thousands of dollars made them more popular around users that would otherwise have been unable to make a game at all.
  14. Adding to that, you have the costs of clothing, make up, lighting, photo shooting sessions. If something needs to be changed or added then you need to shoot lots of pictures again. If during that time the actor changed shape (e.g. got fatter / more fit) then you need to reshoot everything. Lost a prop? reshoot everything again. Midway was a referent when it comes to HD live-action photo shoots (Mortal Kombat, and also their lesser known Batman Forever, that style for that kind of game... let's say it didn't work out well). There's a reason they don't do that anymore. It does not scale. Mortal Kombat 3 Ultimate & MK Trilogy were already pushing it a lot with their endless palette swaps of scorpion and sub zero (+ the cyborg palette swaps). Also actors sueing the company didn't help (it doesn't matter whether they won or not, or whether they were right; either way it was a lot of legal trouble). The most common lawsuit reasons were that the actors claimed they signed for their look-alike to appear in Mortal Kombat 1, but not in the subsequent games. The TL;DR of this thread is: you can do it, but it's a terrible idea.
  15. Yes. Everyone has cleared up that his is a HW limitation. But I don't think nobody has hinted the obvious: You can create more than one view. The most common scenario is for manually managing memory: creating a large pool, and then having different meshes / const buffers / structured buffers living as views to subregions of it. You just can't access all of it all at once. Though for example if you have a 6GB buffer, you could create 3 views of 2GBs each and bind them all 3 to the same shader.
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!