Jump to content
  • Advertisement


This topic is now archived and is closed to further replies.


Inline assembly - how to start

This topic is 5894 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Right, just so you know, I''ve searched the forums, I''ve downloaded the Art of Assembly Language Programming. I''m still not sure if I''m starting in the right place. What I''m trying to do is the following: We''re working on the design of a graphics engine at the moment, and for one feature I need to find out if a special-purpose software rasterizer is faster than hardware. (this is to off-screen surfaces and highly specialized, which is why I''m not sure if software might be faster; the issue is mainly the output format) To be able to evaluate what features are feasible, I''m writing a tiny test version of the rasterizer. I''m working on the C++ implementation, but I can probably improve performance significantly (this is time-critical code) if I write an inline assembly code block for it, especially if I use MMX and CPU-specific SIMD instructions (SSE/3DNow! - I have one floating-point matrix multiplication for each vertex and a few hundred to some thousand vertices) The problem is, the closest I''ve ever got to assembly language is writing DriectX vertex and pixel shaders, which obviously isn''t very close. Now, the Art of Assembly is a nice book and all from what I''ve seen, but it''s 1500 pages and doesn''t include SIMD instructions (for obvious reasons) which makes for a bit of a time-consuming learning period, especially for a feature that might be canned once I''m done. So basically, does anybody have any tips on what I could do to speed up learning? I really only need that one routine in a few CPU-specific versions, and possibly a second version of it. I know a number (5? 6? 7?) of higher-level languages and I know my bit shifts from my NOTs and XORs, my integers from my IEEE floats so it''s not like I''m a total newbie to coding, I''ve just never needed assembly so far. Oh and if it''s of any use I''m using Visual C++ 6 or .NET, whichever is more suitable for inline assembly (I''m assuming it doesn''t really matter though) Huge thanks in advance (this is giving me a serious headache) - JQ Full Speed Games. Coming soon.

Share this post

Link to post
Share on other sites
Iczelion''s Win32 Assembly Homepage

Although it might seem like a good idea to write your own rasterization pipeline using asm, keep in mind that both Microsoft and SGI, as well as video driver manufacturers, have teams of experts optimizing features in drivers as much as possible, a main reason being gamer demand.

It might be a good idea to concentrate on writing strong pixel and vertex shaders, and let the graphics hardware do the dirty work; that''s what it''s there for. Besides, commuting data across the system bus won''t be as time-critical-accomodating as using the AGP bus since AGP is dedicated to the task of video communicating with the rest of the system.

Storing vertex data on the GPU and performing calculations using dedicated hardware would appear to be much higher performing than sending data over the system bus, thru the CPU, and upload to video memory via AGP for display. Keep in mind too that graphics hardware is optimized for "upload once, process onboard" rather than "transmit between video and system memory, using CPU as the workhorse". (Same reason why reading from AGP/video memory is significantly slower than writing to it; because it''s designed that way).

Good luck, and I''d be interested to see your findings/benchmarks.


Share this post

Link to post
Share on other sites
First of all thanks for the link, I''ll check it out. I can''t believe I missed that in the posts I found using search >_<.
Secondly, I am still considering using hardware to "do the dirty work". In fact, we''ll be using hardware acceleration for just about everything. However that rasterizer would be used for updating some dynamic textures on demand (using some considerable caching of course) - these textures are used for some lighting tricks and need only have a few shades of grey, so an 8-bit texture format would make sense. I''m even considering DXT1 for it (white, light grey, dark grey, black) which would reduce texture "bit depth" to effectively 4 bits/pixel. Neither of those formats is supported in hardware. I''ll also only need simple colour fills. (no gouraud shading)
I do realize that I can write a really simple vertex shader or even pixel shader for it, but that would still give me a 16 bit texture, effectively doubling or quadrupling memory usage. I''m also not sure how older hardware performs with rendering to textures. I also don''t yet know how much the increased texture memory usage impacts. (we''re in a really early feature-collecting/discussing/eliminating phase)

Anyway, I haven''t ruled out using hardware at all - this is mainly an optimization idea, and we need to base the final decision (hardware, software, or can the feature) on some hard facts. We might even have to implement a hybrid version (GPU on some systems, CPU on others) but I''ll leave all that to the final timing results...

Thanks again for the input!

- JQ
Full Speed Games. Coming soon.

Share this post

Link to post
Share on other sites

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!