• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.

1 post in this topic

Hello,

My name is Hristo Kolev, a student at “Manchester Midi School”, Manchester, UK and I will be graduating this August with a degree in “Music production and audio engineering”.

As part of my final research paper I am exploring the development of game audio through the years and the role of the composer as part of the development team. The research is mainly focused on the connection between audio and technological developments and the skills a musician has to have in order to work in the industry. I will be extremely grateful if you agree to answer few question on the topic.

 

Name:                                                                                                                                                  Date:

 

1.       How long have you been composing for video games?

 

2.       What is the most effective platform of promotion for you as a composer?

 

3.       What are the essential skills a game composer needs and how did you gain these skills?

 

4.       What are the biggest challenges you have faced in becoming game audio professional?

 

5.       How composing and sound design have changed over the years and how did you managed to adapt to those changes?

 

6.       Do you think that in technological aspect game audio is reaching its limit for innovations?

 

7.       With so many requirements today, such as composing, sound design, knowledge and understanding of scripts, programing and coding, Wwise and FMOD, how do you manage to learn everything and keep track of all new innovations and which of these you think are most essential for a young composer or audio engineer?

 

Please feel free to add any addition information you find useful. Any advice will be highly appreciated!

 

Kind regards,

Hristo Kolev

 

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • By Khairul90
       
      Hey guys I'm a newbie here and interested in game development. A little background, I went to college for a while and learnt C++ and Java, however due to personal reasons at that time I was not able to get most out of my classes. You can say that I'm a little proficient in C++ like I know how to use loops, functions and abit of pointers. I'd like to do game development in my free time and build my portfolio at the same time. So how do I go about this?
      I understand that there are many things in game dev such as cameras, math, AI and the engine. I do not know where to start. should I go back and brush up on my C++ or can I dive into something that is less intensive and learn and build from there? I'm currently reading the book by Jason Gregory of Naughty Dog, Game engine architecture. I'm pretty sure most of the technical stuff would fly over my head but the concepts are very interesting. One of my goals would be to build an engine in the far future.
      Can you guys fill me in on what to do and how to approach this?
       
      Khairul 
    • By ghonchi
      I want to learn game development/modding/make games . But my pc is very old. I don't to where to start & software that will work on this pc.  I don't have money to buy a new pc so this is what I am stuck with. I don't want latest hd graphics type games. I am more than ok with Quake, Half-life, Counter Strike, NOLF, type games or 2d games. Most of the software I see have high system requirements like Unity or Unreal. Please guide me & don't tell me to buy a new pc I cant afford. Also don't tell me that its not possible to develop on these specs. Because these games run smoothly.  Sorry for my English. Thanks
       
      old Pentium 4 1.70 Ghz, 1gb + 512 MB RAM, Windows XP.
       
      Please don't ignore its a matter of life & death for me
    • By Tonya
      On Tuesday 25th July from 5:45pm, the Sydney Game Engine Developers will be holding their next Meetup in Ultimo. This month guest speakers Aidan Millott and Paulina Morrison Fell are joining forces to give a talk on Producing an Internal Engine.
      Producing engine technology that's shared across 10+ productions is no easy feat. Managing conflicting priorities from multiple games and coordinating shared development of features is only just a flavour of how challenging it can be. In this presentation, Aidan and Paulina share their experiences and techniques used to manage the distributed development of internal projects.
      Aidan Millott is an Area Product Owner at Wargaming.net, where he works with internal stakeholders to improve BigWorld, focusing on development of games by small teams. Prior to this, he was an Executive Producer and Studio Manager at Halfbrick studios, where he released 11 games on several platforms.
      Paulina Morrison Fell works at Wargaming.net as a Senior Project Manager. She works with the Client Engine team to manage the development of features for game productions, liaising with stakeholders from multiple studios. She has worked in a number of successful startups and bigger companies like Disney, where she has helped launch over 10 games for small platforms.
      Agenda for the night is:
       5:45pm - Arrive and network (pizza and drinks provided) 6:15pm - Presentation by Aidan and Paulina 7:15pm - Network/Closing There are still spots available and we would love for you to join us for this interesting talk.
      RSVP and come along!
    • By frob
      It is good for programmers to understand what goes on inside a processor. The CPU is at the heart of our career.
      What goes on inside the CPU? How long does it take for one instruction to run? What does it mean when a new CPU has a 12-stage pipeline, or 18-stage pipeline, or even a "deep" 31-stage pipeline?
      Programs generally treat the CPU as a black box. Instructions go into the box in order, instructions come out of the box in order, and some processing magic happens inside. As a programmer, it is useful to learn what happens inside the box. This is especially true if you will be working on tasks like program optimization. If you don't know what is going on inside the CPU, how can you optimize for it?
      This article is about what goes on inside the x86 processor's deep pipeline.
      Stuff You Should Already Know
      First, this article assumes you know a bit about programming, and maybe even a little assembly language. If you don't know what I mean when I mention an instruction pointer, this article probably isn't for you. When I talk about registers, instructions, and caches, I assume you already know what they mean, can figure it out, or will look it up.
      Second, this article is a simplification of a complex topic. If you feel I have left out important details, please add them to the comments at the end.
      Third, I am focusing on Intel processors and the x86 family. I know there are many different processor families out there other than x86. I know that AMD introduced many useful features into the x86 family and Intel incorporated them. It is Intel's architecture and Intel's instruction set, and Intel introduced the most major feature being covered, so for simplicity and consistency I'm just going to stick with their processors.
      Fourth, this article is already out of date. Newer processors are in the works and some are due out in a few months. I am very happy that technology is advancing at a rapid pace. I hope that someday all of these steps are completely outdated, replaced with even more amazing advances in computing power.
      The Pipeline Basics
      From an extremely broad perspective the x86 processor family has not changed very much over its 35 year history. There have been many additions but the original design (and nearly all of the original instruction set) is basically intact and visible in the modern processor.
      The original 8086 processor has 14 CPU registers which are still in use today.
      Four are general purpose registers -- AX, BX, CX, and DX. Four are segment registers that are used to help with pointers -- Code Segment (CS), Data Segment (DS), Extra Segment (ES), and Stack Segment (SS). Four are index registers that point to various memory locations -- Source Index (SI), Destination Index (DI), Base Pointer (BP), and Stack Pointer (SP). One register contains bit flags.
      And finally, there is the most important register for this article: The Instruction Pointer (IP).
      The instruction pointer register is a pointer with a special job. The instruction pointer's job is to point to the next instruction to be run. All processors in the x86 family follow the same pattern. First, they follow the instruction pointer and decode the next CPU instruction at that location. After decoding, there is an execute stage where the instruction is run. Some instructions read from memory or write to it, others perform calculations or comparisons or do other work. When the work is done, the instruction goes through a retire stage and the instruction pointer is modified to point to the next instruction.
      This decode, execute, and retire pipeline pattern applies to the original 8086 processor as much as it applies to the latest Core i7 processor.
      Additional pipeline stages have been added over the years, but the pattern remains.
      What Has Changed Over 35 Years
      The original processor was simple by today's standard. The original 8086 processor began by evaluating the instruction at the current instruction pointer, decoded it, executed it, retired it, and moved on to the next instruction that the instruction pointer pointed to.
      Each new chip in the family added new functionality. Most chips added new instructions. Some chips added new registers. For the purposes of this article, I am focusing on the changes that affect the main flow of instructions through the CPU. Other changes like adding virtual memory or parallel processing are certainly interesting and useful, but not applicable to this article.
      In 1982 an instruction cache was added to the processor. Instead of jumping out to memory at every instruction, the CPU would read several bytes beyond the current instruction pointer. The instruction cache was only a few bytes in size, just large enough to fetch a few instructions, but it dramatically improved performance by removing round trips to memory every few cycles. In 1985, the 386 added cache memory for data as well as expanding the instruction cache. This gave performance improvements by reading several bytes beyond a data request. By this point both the instruction cache and data cache were measured in kilobytes rather than bytes.
      In 1989, the i486 moved to a five-stage pipeline. Instead of having a single instruction inside the CPU, each stage of the pipeline could have an instruction in it. This addition more than doubled the performance of a 386 processor of the same clock rate. The fetch stage extracted an instruction from the cache. (The instruction cache at this time was generally 8 kilobytes.) The second stage would decode the instruction. The third stage would translate memory addresses and displacements needed for the instruction. The fourth stage would execute the instruction. The fifth stage would retire the instruction, writing the results back to registers and memory as needed. By allowing multiple instructions in the processor at once, programs could run much faster.
      1993 saw the introduction of the Pentium processor. The processor family changed from numbers to names as a result of a lawsuit--that's why it is Pentium instead of the 586. The Pentium chip changed the pipeline even more than the i486. The Pentium architecture added a second separate superscalar pipeline. The main pipeline worked like the i486 pipeline, but the second pipeline ran some simpler instructions, such as direct integer math, in parallel and much faster.
      In 1995, Intel released the Pentium Pro processor. This was a radically different processor design. This chip had several features including out-of-order execution processing core (OOO core) and speculative execution. The pipeline was expanded to 12 stages, and it included something termed a 'superpipeline' where many instructions could be processed simultaneously. This OOO core will be covered in depth later in the article.
      There were many major changes between 1995 when the OOO core was introduced and 2002 when our next date appears. Additional registers were added. Instructions that processed multiple values at once (Single Instruction Multiple Data, or SIMD) were introduced. Caches were introduced and existing caches enlarged. Pipeline stages were sometimes split and sometimes consolidated to allow better use in real-world situations. These and other changes were important for overall performance, but they don't really matter very much when it comes to the flow of data through the chip.
      In 2002, the Pentium 4 processor introduced a technology called Hyper-Threading. The OOO core was so successful at improving processing flow that it was able to process instructions faster than they could be sent to the core. For most users the CPU's OOO core was effectively idle much of the time, even under load. To help give a steady flow of instructions to the OOO core they attached a second front-end. The operating system would see two processors rather than one. There were two sets of x86 registers. There were two instruction decoders that looked at two sets of instruction pointers and processed both sets of results. The results were processed by a single, shared OOO core but this was invisible to the programs. Then the results were retired just like before, and the instructions were sent back to the two virtual processors they came from.
      In 2006, Intel released the "Core" microarchitecture. For branding purposes, it was called "Core 2" (because everyone knows two is better than one). In a somewhat surprising move, CPU clock rates were reduced and Hyper-Threading was removed. By slowing down the clock they could expand all the pipeline stages. The OOO core was expanded. Caches and buffers were enlarged. Processors were re-designed focusing on dual-core and quad-core chips with shared caches. In 2008, Intel went with a naming scheme of Core i3, Core i5, and Core i7. These processors re-introduced Hyper-Threading with a shared OOO core. The three different processors differed mainly by the size of the internal caches.
      Future Processors: The next microarchitecture update is currently named Haswell and speculation says it will be released late in 2013. So far the published docs suggest it is a 14-stage OOO core pipeline, so it is likely the data flow will still follow the basic design of the Pentium Pro.
      So what is all this pipeline stuff, what is the OOO core, and how does it help processing speed?
      CPU Instruction Pipelines
      In the most basic form described above, a single instruction goes in, gets processed, and comes out the other side. That is fairly intuitive for most programmers. The i486 has a 5-stage pipeline. The stages are - Fetch, D1 (main decode), D2 (secondary decode, also called translate), EX (execute), WB (write back to registers and memory). One instruction can be in each stage of the pipeline.
      There is a major drawback to a CPU pipeline like this. Imagine the code below.
      Back before CPU pipelines the following three lines were a common way to swap two variables in place.
      XOR a, b XOR b, a XOR a, b
      The chips starting with the 8086 up through the 386 did not have an internal pipeline. They processed only a single instruction at a time, independently and completely. Three consecutive XOR instructions is not a problem in this architecture. We'll consider what happens in the i486 since it was the first x86 chip with an internal pipeline. It can be a little confusing to watch many things in motion at once, so you may want to refer back to the diagram above.
      The first instruction enters the Fetch stage and we are done with that step.
      On the next step the first instruction moves to D1 stage (main decode) and the second instruction is brought into fetch stage.
      On the third step the first instruction moves to D2 and the second instruction gets moved to D1 and another is fetched.
      On the next stage something goes wrong. The first instruction moves to EX ... but other instructions do not advance. The decoder stops because the second XOR instruction requires the results of the first instruction. The variable (a) is supposed to be used by the second instruction, but it won't be written to until the first instruction is done. So the instructions in the pipeline wait until the first instruction works its way through the EX and WB stages. Only after the first instruction is complete can the second instruction make its way through the pipeline. The third instruction will similarly get stuck, waiting for the second instruction to complete.
      This is called a pipeline stall or a pipeline bubble.
      Another issue with pipelines is some instructions could execute very quickly and other instructions would execute very slowly. This was made more visible with the Pentium's dual-pipeline system. The Pentium Pro introduced a 12-stage pipeline. When that number was first announced there was a collective gasp from programmers who understood how the superscalar pipeline worked. If Intel followed the same design with a 12-stage superscalar pipeline then a pipeline stall or slow instruction would seriously harm execution speed. At the same time they announced a radically different internal pipeline, calling it the Out Of Order (OOO) core. It was difficult to understand from the documentation, but Intel assured developers that they would be thrilled with the results.
      Let's have a look at this OOO core pipeline in more depth.
      The Out Of Order Core Pipeline
      The OOO Core pipeline is a case where a picture is worth a thousand words. So let's get some pictures.
      Diagrams of CPU Pipelines
      The i486 had a 5-stage pipeline that worked well. The idea was very common in other processor families and works well in the real world.
      The Pentium pipeline was even better than the i486. It had two instruction pipelines that could run in parallel, and each pipeline could have multiple instructions in different stages. You could have nearly twice as many instructions being processed at the same time.
      Having fast instructions waiting for slow instructions was still a problem with parallel pipelines. Having sequential instruction order was another issue thanks to stalls. The pipelines are still linear and can face a performance barrier that cannot be breached. The OOO core is a huge departure from the previous chip designs with their linear paths. It added some complexity and introduced nonlinear paths:
      The first thing that happens is that instructions are fetched from memory into the processor's instruction cache. The decoder on the modern processors can detect when a large branch is about to happen (such as a function call) and can begin loading the instructions before they are needed.
      The decoding stage was modified slightly from earlier chips. Instead of just processing a single instruction at the instruction pointer, the Pentium Pro processor could decode up to three instructions per cycle. Today's (circa 2008-2013) processors can decode up to four instructions at once. Decoding produces small fragments of operations called micro-ops or u-ops. Next is a stage (or set of stages) called micro-op translation, followed by register aliasing.
      Many operations are going on at once and we will potentially be doing work out of order, so an instruction could read to a register at the same time another instruction is writing to it. Writing to a register could potentially stomp on a value that another instruction needs. Inside the processor the original registers (such as AX, BX, CX, DX, and so on) are translated (or aliased) into internal registers that are hidden from the programmer.
      The registers and memory addresses need to have their values mapped to a temporary value for processing. Currently 4 micro-ops can go through translation every cycle. After micro-op translation is complete, all of the instruction's micro-ops enter a reorder buffer, or ROB. The ROB currently holds up to 128 micro-ops. On a processor with Hyper-Threading the ROB can also coordinate entries from multiple virtual processors. Both virtual processors come together into a single OOO core at the ROB stage. These micro-ops are now ready for processing. They are placed in the Reservation Station (RS).
      The RS currently can hold 36 micro-ops at any one time. Now the magic of the OOO core happens. The micro-ops are processed simultaneously on multiple execution units, and each execution unit runs as fast as it can. Micro-ops can be processed out of order as long as their data is ready, sometimes skipping over unready micro-ops for a long time while working on other micro-ops that are ready. This way a long operation does not block quick operations and the cost of pipeline stalls is greatly reduced.
      The original Pentium Pro OOO core had six execution units: two integer processors, one floating-point processor, a load unit, a store address unit, and a store data unit. The two integer processors were specialized; one could handle the complex integer operations, the other could solve up to two simple operations at once. In an ideal situation the Pentium Pro OOO Core execution units could process seven micro-ops in a single clock cycle. Today's OOO core still has six execution units. It still has the load address, store address, and store data execution units, the other three have changed somewhat. Each of the three execution units
      Today's OOO core still has six execution units. It still has the load address, store address, and store data execution units, the other three have changed somewhat. Each of the three execution units perform basic math operations, or instead they perform a more complex micro-op. Each of the three execution units are specialized to different micro-ops allowing them to complete the work faster than if they were general purpose. In an ideal situation today's OOO core can process 11 micro-ops in a single cycle. Eventually the micro-op is run. It goes through a few more small stages (which vary from processor to processor) and eventually gets retired. At this point it is returned back to the outside world and the instruction pointer is advanced. From the program's point of view the instruction has simply entered the CPU and exited the other side in exactly the same way it did back on the old 8086.
      If you were following carefully you may have noticed one very important issue in the way it was just described. What happens if there is a change in execution location?
      For example, what happens when the code hits an 'if' statement or a 'switch" statement?
      On the older processors this meant discarding the work in the superscalar pipeline and waiting for the new branch to begin processing. A pipeline stall when the CPU holds one hundred instructions or more is an extreme performance penalty. Every instruction needs to wait while the instructions at the new location are loaded and the pipeline restarted. In this situation the OOO core needs to cancel work in progress, roll back to the earlier state, wait until all the micro-ops are retired, discard them and their results, and then continue at the new location.
      This was a very difficult problem and happened frequently in the design. The performance of this situation was unacceptable to the engineers. This is where the other major feature of the OOO core comes in. Speculative execution was their answer.
      Speculative execution means that when a conditional statement (such as an 'if' block) is encountered the OOO core will simply decode and run all the branches of the code. As soon as the core figures out which branch was the correct one, the results from the unused branches would be discarded. This prevents the stall at the small cost of running the code inside the wrong branch. The CPU designers also included a branch prediction cache which further improved the results when it was forced to guess at multiple branch locations. We still have CPU stalls from this problem, but the solutions in place have reduced it to the point where it is a rare exception rather than a usual condition. Finally, CPUs with Hyper-Threading enabled will expose two virtual processors for a single shared OOO core. They share a Reorder Buffer and OOO core, appearing as two separate processors to the operating system. That looks like this:
      A processor with Hyper-Threading gives two virtual processors which in turn gives more data to the OOO core. This gives a performance increase during general workloads. A few compute-intensive workflows that are written to take advantage of every processor can saturate the OOO core. During those situations Hyper-Threading can slightly decrease overall performance. Those workflows are relatively rare; Hyper-Threading usually provides consumers with approximately double the speed they would see for their everyday computer tasks.
      An Example
      All of this may seem a little confusing. Hopefully an example will clear everything up.
      From the application's perspective, we are still running on the same instruction pipeline as the old 8086. There is a black box. The instruction pointed to by the instruction pointer is processed by the black box, and when it comes out the results are reflected in memory.
      From the instruction's point of view, however, that black box is quite a ride. Here is today's (circa 2008-2013) CPU ride, as seen by an instruction:
       
      First, you are a program instruction. Your program is being run. You are waiting patiently for the instruction pointer to point to you so you can be processed. When the instruction pointer gets about 4 kilobytes away from you -- about 1500 instructions away -- you get collected into the instruction cache.
      Loading into the cache takes some time, but you are far away from being run. This prefetch is part of the first pipeline stage. The instruction pointer gets closer and closer. When the instruction pointer gets about 24 instructions away, you and five neighbors get pulled into the instruction queue.
      This processor has four decoders. It has room for one complex instruction and up to three simple instructions. You happen to be a complex instruction and are decoded into four micro-ops. Decoding is a multi-step process. Part of the decode process involved a scan to see what data you need and if you are likely to cause a jump to somewhere new. The decoder detected a need for some additional data. Unknown to you, somewhere on the far side of the computer, your data starts getting loaded into the data cache. Your four micro-ops step up to the register alias table. You announce which memory address you read from (it happens to be fs:[eax+18h]) and the chip translates that into temporary addresses for your micro-ops. Your micro-ops enter the reorder buffer, or ROB.
      At the first available opportunity they move to the Reservation Station. The Reservation Station holds instructions that are ready to be run. Your third micro-op is immediately picked up by Execution Port 5. You don't know why it was selected first, but it is gone. A few cycles later your first micro-op rushes to Port 2, the Load Address execution unit. The remaining micro-ops wait as various ports collect other micro-ops. They wait as Port 2 loads data from the memory cache and puts it in temporary memory slots. They wait a long time... A very long time... Other instructions come and go while they wait for their micro-op friend to load the right data. Good thing this processor knows how to handle things out of order.
      Suddenly both of the remaining micro-ops are picked up by Execution Ports 0 and 1. The data load must be complete. The micro-ops are all run, and eventually the four micro-ops meet back in the Reservation Station. As they travel back through the gate the micro-ops hand in their tickets listing their temporary addresses. The micro-ops are collected and joined, and you, as an instruction, feel whole again. The CPU hands you your result, and gently directs you to the exit. There is a short line through a door marked "Retirement". You get in line, and discover you are standing next to the same instructions you came in with. You are even standing in the same order.
      It turns out this out-of-order core really knows its business.
      Each instruction then goes out of the CPU, seeming to exit one at a time, in the same order they were pointed to by the instruction pointer.
      Conclusion
      This little lecture has hopefully shed some light on what happens inside a CPU. It isn't all magic, smoke, and mirrors.
      Getting back to the original questions, we now have some good answers.
      What goes on inside the CPU? There is a complex world where instructions are broken down into micro-operations, processed as soon as possible in any order, then put back together in order and in place. To an outsider it looks like they are being processed sequentially and independently. But now we know that on the inside they are handled out of order, sometimes even running braches of code based on a prediction that they will be useful.
      How long does it take for one instruction to run? While there was a good answer to this in the non-pipelined world, in today's processors the time it takes is based on what instructions are nearby, and the size and contents of the neighboring caches. There is a minimum amount of time it takes to go through the processor, but that is roughly constant. A good programmer and optimizing compiler can make many instructions run in around amortized zero time. With an amortized zero time it is not the cost of the instruction that is slow; instead it means it takes the time to work through the OOO core and the time to wait for cache memory to load and unload.
      What does it mean when a new CPU has a 12-stage pipeline, or 18-stage, or even a "deep" 31-stage pipeline? It means more instructions are invited to the party at once. A very deep pipeline can mean that several hundred instructions can be marked as 'in progress' at once. When everything is going well the OOO core is kept very busy and the processor gains an impressive throughput of instructions. Unfortunately, this also means that a pipeline stall moves from being a mild annoyance like it was in the early days, to becoming a performance nightmare as hundreds of instructions need to wait around for the pipeline to clear out.
       
      How can I apply this to my programs?
      The good news is that CPUs can anticipate most common patterns, and compilers have been optimizing for OOO core for nearly two decades. The CPU runs best when instructions and data are all in order.
      Always keep your code simple. Simple and straightforward code will help the compiler's optimizer identify and speed up the results. Don't jump around if you can help it, and if you need to jump, try to jump around exactly the same way every time. Complex designs like dynamic jump tables are fancy and can do a lot, but neither the compiler or CPU will predict what will be coming up, so complex code is very likely to result in stalls and mis-predictions.
      On the other side, keep your data simple. Keep your data in order, adjacent, and consecutive to prevent data stalls. Choosing the right data structures and data layouts can do much to encourage good performance. As long as you keep your code and data simple you can generally rely on your compiler's optimizer to do the rest.
      Thanks for joining me on the ride.
      If you enjoyed this article, you might also enjoy Keith O'Conor's look at 'GPU Performance for Game Artists'.
      Updates 2013-05-17 Removed a word that someone was offended by
    • By frob
      Many of the beginners on the site are pre-college students. Often beginners will learn by reading tutorials on the Internet, copying code from books, and trying out things that they find interesting. This article is part of a series that focuses on giving pre-college developers the basics they need to understand data structures.
      The previous article covered Non-Linear Structures. This article is to help beginners understand how to choose a data structure or container class.
      The Data Structures
      The previous articles in the series listed the most frequently used data structures.
      Here is a recap.
      There are the linear structures: the array, the dynamic array, and the linked list. They are linear because they stay in the order you place them. Arrays are very fast for random-access and have moderately good performance when adding and removing at the end. A linked list is very good when frequently adding and removing in the middle.
      There are linear endpoint data structures: the stack and the queue family. Both work very much the same way as the real-world namesakes. A stack, such as a stack of plates or a stack of data, you can "push" something on to the top, you can access the top item, and you can "pop" an item off. A queue, just like a queue of people or a queue of data, works by adding to the end of the line and removing from the front of the line.
      Then there are the non-linear data structures: the data dictionary, the ordered set, and the unordered set. They are non-linear internally, the order you put them in is basically unrelated to the order you will get them back out. The data dictionary works much like a real life dictionary. It has a key (the word to look up) and a value (the definition of the word). An ordered set is exactly the same as a data dictionary that contains keys but no values, and is sorted. An unordered set is just a grab-bag of objects. The name is a little misleading since they really are ordered, it just isn't ordered in a way that is useful to people. All of these structures are ideal for fast lookup.
      The Impact of a Good Selection
      Most of the time programmers just need to iterate over a collection.
      Generally we don't care what order the collection is in, just that we can start at the beginning and visit each item. In this very common situation, the choice of a data structure really doesn't matter.
      When in doubt, the best choice is usually the dynamic array. It can grow to any capacity, and it is fairly neutral, making it easy to swap out with a different data structure later. But sometimes it matters very much.
      One of the more common problems in games is pathfinding. You need to find a path from A to B. One of the most common pathfinding algorithms is A-star. In the A-star algorithm you have a data structure containing partial paths. The structure is sorted so that the most likely partial path is at the front of the container. That path is evaluated and if it isn't the complete path the algorithm will make that partial path into multiple bigger partial paths, and add them to the container.
      Using a dynamic array for this container would be a bad choice for several reasons. First, removing elements from the beginning of a dynamic array is one of the slowest operations we can perform. Second, re-sorting the dynamic array after every addition can also be slow. If you remember from above, there is a data structure that is optimized for this type of access. We are removing from the front and adding to the back, and automatically sorting based on which path is best. The priority queue is ideal for an A-star path container. It is pre-built and fully debugged.
      Choosing Between the Patterns
      Choosing your data structure mostly depends on your usage pattern.
      The Dynamic Array -- The Default Choice
      When in doubt, use a dynamic array. In C++ that is a vector. In Java that is an ArrayList. In C# that is a List. The dynamic array generally does the right thing. It has good performance for most operations, and not bad performance for the rest. If you ever discover that you need a different data structure, it is the easiest one to move from.
      The Stack -- One End Only
      If you are only adding and removing from a single end, use a stack. That is stack in C++, and a Stack in both Java and C#. There are many algorithms that rely on the stack data structure. The first one that comes to my mind is the two-stack calculator. Numerical problems like Towers of Hanoi can be solved with a stack. You probably won't use either of those in a game. Game tools will frequently parse data. Parsers rely heavily on stack data structures to ensure that pairs of items are paired correctly. If you are working with a wide range of AI types, the stack data structure is incredibly useful for a family of automata called a pushdown automaton.
      The Queue Family -- First In, First Out.
      If you are only adding and removing from both ends, use either a queue or a double-ended queue. In C++ that is a queue or deque. In Java you can use the Queue or Deque interface, both are implemented with LinkedList. In C# there is a Queue class, but no built-in Deque. If you need to make sure the important stuff gets done first but otherwise everything happens in order, then reach for the priority queue. In C++ that is a priority_queue, in Java it is a PriorityQueue. In C#, you are on your own.
      Non-Linear Structures -- Fast Search
      If you create a stable group of items and mostly perform random lookups, you will want one of the non-linear structures. Some of them hold pairs of data, some of them hold individual data. Some are ordered in a useful manner, others are ordered in a computer-friendly manner. Trying to make a list of all the combinations would be an article in itself. In fact, it was the previous article. For a list of which one meets the specific searching needs, have a look back there.
      The Linked List -- Frequent Modifications with Order Preserved
      If you are frequently modifying the middle of the container, and if you only need to traverse the list sequentially, use a linked list. In C++ it is called a list. In Java and C# it is called a LinkedList. The linked list is a great container when data is always coming and going and must be kept in order, or when you need to periodically sort and move items around.
      Conclusion
      Choosing the right data structures can make a big difference in how algorithms will perform. Understanding the major data structures, including their benefits and their drawbacks, can help guide you to using the most efficient structure for whatever you need. I recommend that eventually you study them in depth. A full study of these data structures inside a Computer Science degree program will usually last several weeks. Hopefully you have learned about the major data structures and when to choose one structure over another without the multi-week college level study.
      This finishes off the series of articles. Thanks for reading.
  • Popular Now