L. Spiro

Member
  • Content count

    4316
  • Joined

  • Last visited

  • Days Won

    1

L. Spiro last won the day on September 14 2017

L. Spiro had the most liked content!

Community Reputation

25657 Excellent

4 Followers

About L. Spiro

  • Rank
    Crossbones+

Personal Information

  • Interests
    |programmer|
  1. How To Convert double To float

    When dealing with normalized numbers, you just need to make the exponents the same value, which is just a matter of getting the real exponent value from the source by taking the exponent integer and applying the bias, then converting to the new set of bits with the new bias. If the exponents are the same, the rest of the number is converted just by shaving bits off the mantissa. You can round the number by adding the highest bit shaved off. My original implementation had the sign, exponent, and mantissa as separate numbers, so my implementation would have produced incorrect results when the mantissa was all F's and a 1 was added from the shaved bits. My mantissa would wrap to 0 but the exponent wouldn't have increased by 1 as it should. So now the problem with denormalized numbers can be made more general and refer instead to any case where the exponents in the source and destination are different (which will always be the case for denormalized numbers with different amounts of bits for the exponent). The most common case is with denormals, so I will go a bit into that. Denormalized numbers lose the implicit 1 on the mantissa. For normalized cases the mantissa always increases the value based off the exponent (it is always Exponent * [1, 1.999999...]) whereas for denormalized cases the number always decreases based off the exponent (Exponent * [0.999999..., 0]). You can't arrive at a general solution just with bit-shifting tricks now. You have to take the source number and the denormalized exponent bias on the destination to determine what mantissa value will best match the source value when multiplied. In other words, from a double to a float, ::pow( 2, -126 ) * X ≈ SrcValue, solve for X. Now you have one case generalized for values where the exponent in both numbers matches and another generalized way to handle denormalized cases, and then you patch for cases where the smallest normalized number is closer to the source number than the highest denormalized number, plus rounding into InF, and you are working with a single integer so that adding to the mantissa properly rolls up the exponent when necessary. Great! Now throw that all away and copy this guy's code: https://stackoverflow.com/a/3542975 He hard-coded it from 32-bit floats to 16-bit floats. I generalized it to go from 64-bit floats to anything else. It works properly in all cases, including going into InF and NaN, but doesn't allow specifying a rounding mode explicitly. I will have to add that later. L. Spiro
  2. How To Convert double To float

    Today I got it fully compliant with IEEE standards and full support for denormalized values. I created some intrinsics that allow you to specify the properties of floats so you can make any kind of float you want as long as no components are larger than in a 64-bit double (a limitation I might handle in the future). Here are some examples. as_float10( 1.0 / 3 ) 0.328125 (3EA80000h, 3FD5000000000000h) as_float11( 1.0 / 3 ) 0.33203125 (3EAA0000h, 3FD5400000000000h) as_float14( 1.0 / 3 ) 0.3330078125 (3EAA8000h, 3FD5500000000000h) as_float16( 1.0 / 3 ) 0.333251953125 (3EAAA000h, 3FD5540000000000h) as_float32( 1.0 / 3 ) 0.3333333432674407958984375 (3EAAAAABh, 3FD5555560000000h) as_float64( 1.0 / 3 ) 0.333333333333333314829616256247390992939472198486328125 (3EAAAAABh, 3FD5555555555555h) These are just shortcuts for the common formats you might encounter. For a custom type you can use the full instrinsic: as_float( 1, 7, 20, true, 1.0 / 3 ) // as_float( signBits, expBits, manBits, implicitMantissa, value ) 0.33333301544189453125 (3EAAAAA0h, 3FD5555400000000h) I will be adding more features to get properties of custom floats too. For example: as_float_max( 1, 7, 20, true ) // Gets the maximum value for the given type of float. as_float_min( 1, 7, 20, true ) // Gets the min non-0 value for the given type of float. as_float_min_n( 1, 7, 20, true ) // Gets the min non-0 normalized value for the given type of float. as_float_inf( 1, 7, 20, true ) // Etc. Also some options to display the components of floats separately (sign, exponent, and mantissa), and more features. L. Spiro
  3. How To Convert double To float

    You’ve missed the point. The IEEE standard would give me guidelines that I can generalize to any floating-point format of any combination of bits, as demonstrated above. In the example of PI I gave, you can see how it degraded in precision based on the number of bits I assigned to the exponent and mantissa. From double = 3.1415926535897931, to float = 3.1415927410125732, to float16 = 3.1406250000000000. It’s exactly this degradation that it is important to see and investigate for programmers in general, but heavily for graphics programmers. And again, note that I am not relying on a CPU cast, I am doing the cast manually, so I am not worried about what the compiler will cast etc., and I am not restricted in the floating-point type. In the last example I literally just invented a 38-bit float. That’s the whole point. I need to cast manually because I need to inspect float types that are not natively supported in C/C++. If I were to get clear documentation, or better yet pseudocode showing the micro-instruction process for converting a double to a float, I will be able to generalize it for my purposes to cast to anything. Every number I posted above actually came out of my converter, which I wrote just last night. Even the maximum float value came from my implementation rather than looking at FLT_MAX. The fact that my class generates the same value as FLT_MAX is just because my implementation is 100% correct for normalized numbers. That part is a completely-solved area. I want to look at standards and example implementations so that I can be confident I’ve handled all edge cases specifically dealing with denormalized numbers. L. Spiro
  4. The reason the title surprised you is the same as the reason I can’t get proper results on Google: You’re thinking about a cast, and no matter what search terms I use I keep getting results talking about casts. I enjoy super-low-level programming and I have both a personal interest in and a need for creating a manual cast from a double to a floating-point type given customizable properties (number of exponent bits, number of mantissa bits, is there a sign bit?, is there an implied mantissa bit?, etc.) I’ve already written something that could be called a prototype and it correctly handles all “normal” conversions from a 64-bit double into any other type of floating-point format based on however many bits you want. So as an example, let’s say I have PI as a double constant “3.1415926535897931”. I manually cast it to a 32-bit float by specifying SIGNBIT=TRUE, EXP=8, MANT=24, IMPLICITMAN=TRUE and I get this result: Actual [1,8,24,TRUE]FLOAT Result = 3.1415927410125732 Sign = 0 Exp = 0x0000000000000080 Man = 0x0000000000490fdb So it manually gives the exact same result as a regular cast from double to float, and can round or truncate (here, rounding up was performed). Casting manually seems useless, right? But I want to support arbitrary floating-point types that are not present in C. How about an example from a 16-bit float? Actual [1,5,11,TRUE]FLOAT Result = 3.1406250000000000 Sign = 0 Exp = 0x0000000000000010 Man = 0x0000000000000248 I also want to study arbitrary floating-point formats. For example, the smallest non-0 number a 32-bit float can be is 1.4012984643248171e-45 and the max is 3.4028234663852886e+38. How about for a 16-bit float? [1,5,11,TRUE]FLOAT Smallest non-0: 5.9604644775390625e-08 Max: 65504.000000000000. How about this random format? [1,7,31,TRUE]FLOAT Smallest non-0: 2.0194839173657902e-28 Max: 1.8446744065119617e+19 So a lot of my converter is already working. If you are still wondering what the point is, you can understand that a graphics programmer who has to work with 16-bit and 32-bit shader precision, F16 and F32 textures, D24 depth textures, and R11G11B10 float textures can really find something like this useful. There are many floating-point formats out there but not really any tools to investigate those float formats. Now to the Question There are special cases for denormalized numbers that I am not currently handling, and I am temporarily making assumptions about sign bits etc. Anyone have a good link to a break-down of casting from a floating-point value to another type of floating-point value manually? Going over IEEE doesn’t provide example implementations nor does it really dig into the details. The details I often find cover mostly what I have already implemented, which is from a normalized number converted to another normalized number. I don’t really see guides on the best way to implement the cases where either the source or the destination is denormalized. I can implement it “my way” but I definitely want to look at what has been done or at the very least go over specifications to ensure my way fully complies. L. Spiro
  5. Physically Accurate Material Layering

    If you are still interested, we (at tri-Ace) published this paper on an efficient physically based layering system. http://research.tri-ace.com/Data/s2012_beyond_CourseNotes.pdf The Bouguer-Lambert-Beer law is mentioned and it is explained how we improved upon its performance. How IBL fits in is explained as well. L. Spiro
  6. DirectXMath's XM_CALLCONV

    I would. It is always an error to prioritize subjective aesthetics over functionality. It is always a mistake to ask permission from your compiler to implement a hack or "alternative" code. Checking that something works on your compiler only proves that it works as intended on one compiler. They literally gave a warning that without using this macro as a calling convention your code might not run correctly depending on your compiler and architecture. It is never valid to hinder code's portability simply because of your subjective views on aesthetics. L. Spiro
  7. I'm so confused

    Your topic looked like spam to someone. Don’t take it personally. L. Spiro
  8. Goodbye!

    L. Spiro
  9. Shadow Mapping

    You can only pre-bake a shadow map if neither the light nor any objects in the shadow map are moving. Any kind of moving sun is out. Even if your sun is static, you could only bake the cascades if they cover a specific area and never move (with the player, based on where the player looks, etc.) This largely defeats the purpose of cascades, and I can’t think of any cases where it would be useful to bake them. I can’t remember the site, company, or game, but around 7 years ago this idea was published and explained with a demo. You reproject the shadow look-up coordinates in much the same way as you have to reproject the scene for temporal anti-aliasing. It hasn’t caught on since then due to the difficulty in implementing it and all the edge cases that arise that either slow down the routine if handled fully or lead to artifacts if not. As with any reprojection technique, you have problems with things appearing from out-of-view. But it might be suitable for the farthest cascade of a shadow map. We were considering this but did not implement it. If your world is as large as ours in Final Fantasy XV, then your farthest cascade will be mostly blurry and you can get away with not rendering certain things such as small foliage (another optimization I implemented), so if there is a candidate for this type of reduced updating rates it would be that. L. Spiro
  10. Shadow Mapping

    Yes, but unless you pass extra parameters that means all of your shadows have to have the same resolution. I don’t think NVIDIA is different. In either case, sampling a cube map actually emits a series of intrinsics that give the face index and 2D coordinates. Since consoles expose these intrinsics, my routines for Xbox One and PlayStation 4 are instruction-for-instruction exactly the same as a cube sample, except for one extra instruction to increase my Y coordinate based off the face index. My routine for Windows can’t use the intrinsics but should compile to the same thing. I don’t know of any open-source implementations as mine are derived from looking at shader assembly. Clearing can be done with a single call, which is a win on any platform that clears by just setting a flag, where the time is dominated by jumping back and forth between the driver and user code, etc. Less of a win for platforms that modify each pixel, but still a slight win. Filling requires no render-target swaps. Filtering becomes a win because you can easily use any shadow filtering you wish. As mentioned by JoeJ, you widen the projection for each cube face by a specified amount of pixels, so for example if you have a 512×512 texture and you want to widen the projection by exactly 3 pixels, your field-of-view will be 90.33473583181500191937274374069° instead of 90°. Now you have 3 border pixels to sample for any kind of filtering you wish to use with no complicated math to sample across faces etc. This also allows all of your shadows to have a unified look, as you will no longer have to use one filter for spot lights and a simpler one for point lights. L. Spiro
  11. Shadow Mapping

    Because, as mentioned, scenes can vary widely, the common way to decide how many shadows you will have is derived from a performance goal on a specific target spec. In other words, determine how many shadows you can have while maintaining X FPS on Y hardware. The reason you should be using an algorithm like this to determine your own metrics is because not only do different scenes in different games come with different performance compromises, your own implementation of shadows may perform very differently from others'. Your question is useful for allowing you to consider how optimized shadow maps must be in other games and for you to consider how much you have to do to get there, but if you were asked right now by a boss to estimate how many shadows you can use you would use the above-mentioned process. To give you actual stats and an idea of the optimizations used, here is what I did on Final Fantasy XV. We had a basic implementation likely matching what you have, with cube textures for point lights and different textures for the rest (4 textures for a cascaded directional light and X spot lights). The first thing I did was improve the culling on the cascaded directional light so that the same objects from the nearest cascade were not being needlessly drawn into the farther cascades. If you aren't doing this, it can lead to huge savings as you can avoid having your main detailed characters being redrawn, complete with re-skinning etc. Next I moved the 6 faces of a cube texture to a single 1X-by-6X texture. So a 512-by-512 cube texture became a single 512-by-3,072 texture. Although you must write your own look-up function that takes 3D coordinates and translates them to a 2D coordinate on this texture, it comes with a few advantages in caching, filtering, clearing, filling, and most importantly it prepares for the next big optimization: a shadow atlas. Now that all shadows were being drawn to 2D textures, I created a texture atlas for all the shadows except the cascaded ones. A single large texture for all the point and spot lights. It was 2,048-by-2,048 first but could grow to 4,096-by-2,048 if necessary. Putting all the point and spot shadows into a single texture was a huge gain for many reasons, but one of main gains was that we had access to all the shadows during a single lighting pass, which meant we could draw all the shadows in a single pass instead of many. At this point our limit was simply how many shadows could be drawn until the texture atlas got filled, sorted by priority largely based on distance. As mentioned by MJP, an important aspect of this is to cull all six faces of a point-light shadow. Any shadow frustums not in view meant less time creating shadow maps and more room for other shadows in the atlas. Next, I wanted the shadow maps to have LOD, as the smaller shadow sizes would allow faster creation, and smaller shadow maps meant more shadows could fit into the atlas. Each shadow frustum (up to 6 for point lights and 1 for each spot light, where each shadow frustum at least partially intersects the camera frustum—any shadow frustums fully outside the view frustum would be discarded prior to this step) was projected onto a small in-memory representation of a screen and clipped by the virtual screen edges. This sounds complicated but it is really simple. The camera's world-view matrix translates points into a [-1,-1]...[1,1] space on your screen, so we simply used that same matrix to transform the shadow frustum points, then clipped anything beyond -1 and 1 in both directions. Now with the outline of the clipped shadow frustum in -1...1 space, taking the area of the created shape gives you double the percentage of the screen it covers (represented as 0=0% to 2=100%). In short, we measured how much each shadow frustum is in view of the camera. Based on this percentage, I would drop the shadow resolution by half, or half again if even less was in view, etc. I believe I put a limit at 64-by-64. If you play Final Fantasy XV, you can see this in action if you know where to look. If you slowly move so that a shadow from a point light takes less and less screen space you might be able to see the resolution drop. Now with the shadow-map LOD system, most shadows are drawn at a lower resolution, only going full-size when you get near and are looking directly at the shadowed area. Because this actually affects so many shadows, the savings are significant. If you decide to keep the same limit on shadows as you had before you will find a huge gain in performance. In our case, we continued allowing the shadow atlas to be filled, so we were able to support double or more shadows with the same performance. Another important optimization is to render static objects to offline shadow maps. A tool generates the shadow maps offline, rendering only static objects (buildings, lamp posts, etc.) into them. At run-time, you create the final shadow map by copying the static shadow map over it and then rendering your dynamic objects (characters, foliage, etc.) into it. This is a major performance improvement again. We already had this for Final Fantasy XV, but since I added the shadow LOD system I had to make the offline static shadows carry mipmaps. It is important to note that the shadow mipmaps are not a downsampling of mip level 0—you have to re-render the scene into each mipmap, again with some lower limit such as 64-by-64. All of this together allowed us probably around 30 shadow maps with the ability to dynamically scale with the scene and without too many restrictions on the artists. Shadow maps were sorted by a priority system so that by the time the shadow atlas was filled, the shadows that had to be culled were distant, off-to-the-side, or otherwise unimportant. L. Spiro
  12. Mod for Mac

    You would already have had an answer had you simply downloaded it and checked. L. Spiro
  13. I'm starting as an indie ! Any advices ?

    What is a "graphist"? An artist? You can advertise at literally any stage in development as long as your methods are in line with how much you have to show. Don't hype it up and then only show a concept and a sketch of a map. If you have very little to show, a simple blog post is appropriate. Once you have more, you can advertise it more aggressively. If you want to advertise it across several forums and sites, at the very minimum you must have something playable on your side. That doesn't mean you have to let people play, it means that you have a working prototype or game from which you can take enough screenshots to warrant people's time. I would generally say common sense should guide you. Don't pester people unless it is worth their time. That means that at this time, with what you have, it is definitely not okay to pester people with requests to "share if you find it interesting." L. Spiro
  14. How do you balance gaming and game dev?

    Won't stop me from trying. They are both hit-and-miss in different ways. Frequently indie studios are created by people who don't want to be told what to do—they have their own ideas and this is how to get them done. A "miss" here comes about when the designer doesn't have enough of a clue to know that he is doesn't stand out from anyone else in the industry. The quality of the game may suffer from lack of detail, poor controls, poor concept, etc. A "hit" here comes when you actually have someone with a genuinely good idea, who understands difficulty balance, progression, attention to detail, etc. The game usually ends up being something entirely new. In AAA development, a "miss" comes about because the designers and up are concerned about their jobs and profits, so once a winning formula is found they tend to stick to it. You get sequel-after-sequel and games with only slight deviations from the formula. Breaking from the formula is risky, so a "hit" here often means that the concept may not be radically new, but it's executed well, has sturdy tech and graphics behind it, good balance, etc. L. Spiro
  15. Your first project

    Same view on a different day: Maybe you can understand certain Japanese games' backgrounds better. Living in Tokyo gives you all kinds of ideas for games. This is the atmosphere behind my manga. The industry doesn't have to be hell. In fact there is a very simple algorithm that ensures you never get caught in hell no matter what your industry is: While ( JobIsHell ) GetNewJob();. It's so simple even a child could manage it. L. Spiro