Jump to content
  • Advertisement

Search the Community

Showing results for tags 'Optimization'.

The search index is currently processing. Current results may not be complete.


More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Categories

  • Audio
    • Music and Sound FX
  • Business
    • Business and Law
    • Career Development
    • Production and Management
  • Game Design
    • Game Design and Theory
    • Writing for Games
    • UX for Games
  • Industry
    • Interviews
    • Event Coverage
  • Programming
    • Artificial Intelligence
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Engines and Middleware
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
  • Archive

Categories

  • Audio
  • Visual Arts
  • Programming
  • Writing

Categories

  • Game Developers Conference
    • GDC 2017
    • GDC 2018
  • Power-Up Digital Games Conference
    • PDGC I: Words of Wisdom
    • PDGC II: The Devs Strike Back
    • PDGC III: Syntax Error

Forums

  • Audio
    • Music and Sound FX
  • Business
    • Games Career Development
    • Production and Management
    • Games Business and Law
  • Game Design
    • Game Design and Theory
    • Writing for Games
  • Programming
    • Artificial Intelligence
    • Engines and Middleware
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
    • 2D and 3D Art
    • Critique and Feedback
  • Community
    • GameDev Challenges
    • GDNet+ Member Forum
    • GDNet Lounge
    • GDNet Comments, Suggestions, and Ideas
    • Coding Horrors
    • Your Announcements
    • Hobby Project Classifieds
    • Indie Showcase
    • Article Writing
  • Affiliates
    • NeHe Productions
    • AngelCode
  • Topical
    • Virtual and Augmented Reality
    • News
  • Workshops
    • C# Workshop
    • CPP Workshop
    • Freehand Drawing Workshop
    • Hands-On Interactive Game Development
    • SICP Workshop
    • XNA 4.0 Workshop
  • Archive
    • Topical
    • Affiliates
    • Contests
    • Technical
  • GameDev Challenges's Topics
  • For Beginners's Forum

Calendars

  • Community Calendar
  • Games Industry Events
  • Game Jams
  • GameDev Challenges's Schedule

Blogs

There are no results to display.

There are no results to display.

Product Groups

  • GDNet+
  • Advertisements
  • GameDev Gear

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


About Me


Website


Industry Role


Twitter


Github


Twitch


Steam

Found 9 results

  1. Hi, I am trying to implement a custom texture atlas creator tool in C++, need suggestion regarding any opensource fast API or library for image import and export? Also this tool will compress the final output atlas image into multiple formats like DXT5, PVRTC and ETC based on user input, what should be the best way to implement this? Thanks
  2. Awoken

    More Adventures in Robust Coding

    Hello GameDev, This entry is going to be a big one for me, and it's going to cover a lot. What I plan to cover on my recent development journey is the following: 1 - Goal of this Blog entry. 2 - Lessons learned using Node.js for development and testing as opposed to Chrome console. 3 - Linear Path algorithm for any surface. 4 - Dynamic Path Finding using Nodes for any surface, incorporating user created, dynamic assets. 5 - short term goals for the game. -- - -- - -- - -- - -- - -- - Goal of this Blog entry - -- - -- - -- - -- - -- - -- My goal for this adventure is to create a dynamic path-finding algorithm so that: - any AI that is to be moved will be able to compute the shortest path from any two points on the surface of the globe. - the AI will navigate around bodies of water, vegetation, dynamic user assets such as buildings and walls. - will compute path in less then 250 milliseconds. There are a few restrictions the AI will have to follow, in the image above you can see land masses that are cut off from one another via rivers and bodies of water are uniquely colored. If an AI is on a land mass of one color, for now, it will only be able to move to a location on the same colored land mass. However; there are some land masses that take up around 50% of the globe and have very intricate river systems. So the intended goal is be able to have an AI be on one end of the larger land mass and find the shortest path to the opposite end within 250 milliseconds. Currently my path finding algorithm can find the shortest path in anywhere from 10 ms and up, and when I say up, I mean upwards of 30 seconds, and that's because of the way I built the algorithm, which is in the process of being optimised. -- - -- - -- - -- - -- - -- - Lessons learned using Node.js for development and testing - -- - -- - -- - -- - -- - -- As of this writing I am using Node.js to test the efficiency of my algorithms. This has slowed down my development. I am not a programmer by trade, I've taught myself the bulk-work of what I know, and I often spend my time re-inventing the wheel and learning things the hard way. Last year I made the decision to move my project over to Node.js for continued development, eventually it all had to be ported over to Node.js anyways. In hind sight I would have done things differently. I would have continued to use Chrome console for testing and development, small scale, then after the code was proven to be robust would I then port it over to Node.js. If there is one lesson I'd like to pass on to aspiring and new programmers, it's this, use a language and development environment that allows you, the programmer, to jump into the code while it's running and follow each iteration, line by line, of code as it's be executed, basically debugging. It is so easy to catch errors in logic that way. Right now I'm throwing darts at a dart board, guesses what I should be sending to the console for feedback to help me learn more about logical errors using Node.js, see learning the hard way. -- - -- - -- - -- - -- - -- - Linear Path algorithm for any surface. - -- - -- - -- - -- - -- - -- In the blog entry above I go into detail explaining how I create a world. The important thing to take away from it is that every face of the world has information about all surrounding faces sharing vertices pairs. In addition, all vertices have information regarding those faces that use it for their draw order, and all vertices have information regarding all vertices that are adjacent to them. An example vertices and face object would look like the following: Vertices[ 566 ] = { ID: 566, x: -9.101827364, y: 6.112948791, z: 0.192387718, connectedFaceIDs: [ 90 , 93 , 94 , 1014 , 1015 , 1016 ], // clockwise order adjacentVertices: [ 64 , 65 , 567 , 568 , 299 , 298 ] // clockwise order } Face[ 0 ] = { ID: 0, a: 0, b: 14150, c: 14149, sharedEdgeVertices: [ { a:14150 , b: 14149 } , { a:0 , b: 14150 } , { a:14149 , b:0 } ], // named 'cv' in previous blog post sharedEdgeFaceIDs: [ 1 , 645 , 646 ], // named 's' in previous blog post drawOrder: [ 1 , 0 , 2 ], // named 'l' in previous blog post } Turns out the algorithm is speedy for generating shapes of large sizes. My buddy who is a Solutions Architect told me I'm a one trick pony, HA! Anyways, this algorithm comes in handy because now if I want to identify a linear path along all faces of a surface, marked as a white line in the picture above, you can reduce the number of faces to be tested, during raycasting, to the number of faces the path travels across * 2. To illustrate, imagine taking a triangular pizza slice which is made of two faces, back to back. the tip of the pizza slice is touching the center of the shape you want to find a linear path along, the two outer points of the slice are protruding out from the surface of the shape some distance so as to entirely clear the shape. When I select my starting and ending points for the linear path I also retrieve the face information those points fall on, respectively. Then I raycaste between the sharedEdgeVertices, targeting the pizza slice. If say a hit happens along the sharedEdgeVertices[ 2 ], then I know the next face to test for the subsequent raycaste is face ID 646, I also know that since the pizza slice comes in at sharedEdgeVertice[ 2 ], that is it's most likely going out at sharedEdgeVertices[ 1 ] or [ 0 ]. If not [ 1 ] then I know it's 99% likely going to be [ 0 ] and visa-versa. Being able to identify a linear path along any surface was the subject of my first Adventure in Robust Coding. Of course there are exceptions that need to be accounted for. Such as, when the pizza slice straddles the edge of a face, or when the pizza slice exits a face at a vertices. Sometimes though when I'm dealing with distances along the surface of a given shape where the pizza slice needs to be made up of more than one set of back to back faces, another problem can arise: I learned about the limitations of floating point numbers too, or at least that's what it appear to be to me. I'm sure most of you are familiar with some variation of the infinite chocolate bar puzzle So with floating point numbers I learned that you can have two faces share two vertices along one edge, raycaste at a point that is directly between the edges of two connecting faces, and occasionally, the raycaste will miss hitting either of the two faces. I attribute this in large part because floating point numbers only capture an approximation of a point, not the exact point. Much like in the infinite chocolate bar puzzle there exists a tiny gap along the slice equal in size to the removed piece, like wise, that tiny gap sometimes causes a miss for the raycaste. If someone else understands this better please correct me. -- - -- - -- - -- - -- - -- - Dynamic Path Finding using Nodes for any surface - -- - -- - -- - -- - -- - -- Now that I've got the linear path algorithm working in tip top shape, I use it in conjunction with Nodes to create the pathfinding algorithm. Firstly I identify the locations for all nodes. I do this using a Class I created called Orientation Vector, I mention them in the blog post above. When they're created, they have a position vector, a pointTo vector, and an axis vector. The beauty of this class is that I can merge them, which averages their position, pointTo, and axis vectors, and it allows me to rotate them along any axis, and it allows me to move them any distance along the axis of their pointTo vector. To create shoreline collision geometry, and node collision geometry, illustrated above, and node locations along shorelines, illustrated below, I utilise the Orientation Vector Class. Firstly, the water table for the world is set to an arbitrary value, right now it's 1.08, so if a vector for a given face falls below the table and one or two vertors are above the table then I know the face is a shoreline face. Then I use simple Math to determine at what two points the face meets the water and create two OVectors, each pointing at each-other. Then I rotate them along their y axis 90 and -90 degrees respectively so that they are now facing inland. Since each face, which are shoreline faces, touch one another, there will be duplicate OVectors a each point along the shore. However, each Ovector will have a pointTo vector relative to it's sister Ovector during creation. I merge the paired Ovectors at each point along the shore, this averages their position, pointTo and axis. I then move them inland a small distance. The result is the blue arrows above. The blue arrows are the locations of three of the thousands of nodes created for a given world. Each Node has information about the shoreline collision geometry, the node collision geometry ( the geometry connecting nodes ), and the Node to its left and the Node to its right. Each face of collision geometry is given a Node ID to refer to. So to create the path-finding algorithm. I first identify the linear path between the starting and ending points. I then test each segment of the linear path for collision geometry. If I get a hit, I retrieve the Node ID. This gives me the location for the Node associated for a given face of collision geometry. I then travel left and right along connecting Nodes checking to see if a new Linear path to the end point is possible, if no immediate collision geometry is encountered, the process continues and is repeated as needed. Subsequently, a list of points is established, marking the beginning, encountered Nodes and end of the line of travel. The List is then trimmed by testing linear paths between every third point, if a valid path is found, the middle point is spliced. Then all possible paths that have been trimmed are calculated for distance. the shortest one wins. Below is the code for the algorithm I currently use. its my first attempt at using classes to create an algorithm. Previously I just relied on elaborate arrays. I plan on improving the the process mentioned above by keeping track of distance as each path spreads out from it's starting location. Only the path which is shortest in distance will go through its next iteration. With this method, once a path to the end is found, I can bet it will be shortest, so I won't need to compute all possible paths like I am now. The challenge I've been facing for the past two months is sometimes the Nodes end up in the water, The picture above shows a shoreline where the distance the OVectors travel would place them in the water. Once a node is in the water, it allows the AI to move to it, then there is no shoreline collision geometry for it to encounter, which would keep it on land, and so the AI just walks into the ocean. Big Booo! I've been writing variations of the same function to correct the location of the geometry shown below in Red and Yellow below. But what a long process. I've rewritten this function time and time again. I want it to be, well as the title of this Blog states, Robust, but it's slow going. As of today's date, it's not Robust, and the optimised path-finding algorithm hasn't been written either. I'll be posting updates in this blog entry as I make progress towards my goal. I'll also make mention what I achieve for shortest, long time for pathfinding. Hopefully it'll be below 250 ms. -- - -- - -- - -- - -- - -- - short term goals for the game - -- - -- - -- - -- - -- - -- Badly... SO BADLY I want to be focusing on game content, that's all I've been thinking about. Argh, But this all has to get wrapped up before I can. I got ahead of myself, I'm guilty of being too eager. But there is no sense building game content on top of an engine which is prone to errors. My immediate goals for the engine are as follows: // TO DO's // // Dec 26th 2017 // /* * << IN PROGRESS >> -update path node geometry so no errors occur * -improve path finding alg with new technique * -improve client AI display -only one geometry for high detail, and one for tetrahedron. * -create ability to select many AI at the same time by drawing a rectangle by holding mouse button. * -create animation server to recieve a path and process animation, and test out in client with updates. * -re-write geometry merging function so that the client vertices and faces have a connected Target ID * -incorporate dynamic asset functionality into client. * -create a farm and begin writing AI. * -program model clusters * -sychronize server and client AI. Test how many AI and how quickly AI can be updated. Determine rough estimate of number of players the server can support. * */ see the third last one! That's the one, oh what a special day that'll be. I've created a Project page, please check it out. It gives my best description to date of what the game is going to be about. Originally I was going to name it 'Seed', a family member made the logo I use as my avatar and came up with the name back in 2014. Then just this week I find out that some studio in Europe is making THE EXACT SAME GAME ! WHA??? http://www.pcgamer.com/seed-is-a-hugely-ambitious-in-development-mmo-that-echoes-eve-online-rimworld-and-the-sims/ I'm being facetious, but they're very close to being the same game. Anyways, Mine will be better, you read it here first! hahaha. The project is no longer going to be called Seed, it's instead going to be called what I've always called it and will probably always call it; the game [ edit: 02/02/18 Some new screen shots to show off. All the new models were created by Brandross. There are now three earth materials, clay, stone and marble. There are also many types of animals and more tree types. ] Thanks for reading and if you've got anything to comment on I welcome it all. Awoken
  3. Welcome back colony managers, here comes a set of awesome new features improving your space colonization systems. This month we spent a lot of time into pimping the user interface of the game into it's 4th evolution and making it 4k ready by the way. With the remediation center a very important infrastructure building made it into this release and also the logistic center got a workover enabling you to fully automate repairing and cleaning processes. And there's more... TL;DR New user interface Remediation center & carbon sequestration Logistic center is now maintenance station Desertification threat and temple power Solar park and wind farm alignement Mountain variations Fixes & improvements New User Interface The existing user interface clearly did not meet the demands of our colony & planet simulation. It was very clumsy and reminiscent of a casual mobile phone game. So we rebuilt it, using the smaller and much clearer Roboto font for all text elements, while keeping our futuristic fonts for headers and elements that need highlighting. The new interface now has become much easier to read while taking less screen space and thisway giving even more focus to the planet itself. [gallery columns="2" ids="6249,6248,6254,6255"] If you are using a relatively small screen with a high resolution you can still zoom the interface up to 150% in the options. A side effect of our work is that the new user interface has a higher resolution and thus is ready for your 4K display. Remediation Center & Carbon Sequestration The new building remediation center comes with an additional worker drone and uses this drone to automatically start the clean up process for nearby fields. An powerful upgrade for the center is carbon sequestration - the technical separation and storage of CO2 emissions from surrounding buildings and power plants. Thanks to underground compression, the exhaust gases do not enter the atmosphere. The second ugrade for this building is "Advanced Remediation Process" it halves the cost of soil clean up in the area. Logistic Center is now Maintenance Station The logistic center is now called "Maintenance station" and automatically repairs nearby damaged buildings by default. It's upgrades are the fire station and "Advanced Repair Process", which halves the cost of repair processes in the area. Desertification Threat and Temple Power In the course of global warming there will be new deserts emerging next to others. Thereby the infertile wasteland is growing. The only way to prevent this is to plant forests onto desert fields so it can't spread. Trees will effectively stop the process of desertification. In the illuminati temple you can use gaian energy to create a field of desert anywhere around the world. Except on fields with forests on them. Solar Panels and Wind Farms They are aligning to the sun and the wind direction now. Mountain Variations Each mountain has two versions now to bring more variety into the game's look. These two versions are now put to the three edges of big mountains as well to make it easier to see whether a field is blocked by mountains. Fixes & Improvements Fixed animation problems: we had some seriously strange problems with sub models and animations. Mystery finally concluded! Ships and oilplatforms no longer visible under water, while being built. City expansion now also needs drones. Show diplomatic relation progress as ring (full ring reaches next diplomatic level) Fixed orientation angle of volcanoes and huge mountains on small planets Sandbox category for mushroom forests As always we wish you a good fun and hope you let us know if anything comes to your mind about the new features! Jens & Martin
  4. Which ASO Tools are Right for Your Game? When I started doing app store optimization (ASO) for my games, I was so overwhelmed by the numerous ASO tools available in the market… App Annie, Mobile Action, Meatti, Sensor Tower, App Radar, Priori Data, ASOdesk, Searchman, TheTool, Keyword Tool, AppKeywords.net, Apptentive, Appbot, AppFollow, Apptopia, APPlyzer, SplitMetrics, StoreMaven, Raise Metrics, TestNest, SearchAdsHQ, SearchAds by Mobile Action, adAhead, you name it. And as if things were not already complicated enough … These ASO tools provide very different features, pricing, options, … When deciding which ones to use, I was like… How to Choose your ASO Tools If you are looking for your best ASO tools, check out my findings below. I will first start with a categorization of ASO tools, and follow up with a big list of app store optimization tools. You can then choose your ASO tools based on the category and the details of individual tools. Free Bonus: Click here to get a free comparison spreadsheet of all top ASO tools. It can be printed nicely on one page, and you can easily sort the ASO tools by type, price, availability of free version, etc.It also includes 2 more ASO tools that are not covered in this post. Types of ASO Tools ASO tools come in many flavors and packages, and they can be grouped into the following categories: 1) App Keyword Optimization Tools ASO tools of this type help you optimize your app keywords to increase your app search traffic. The app keyword related features include app keyword suggestions, keyword optimization, keyword tracking, etc. ASO tools like Mobile Action, Sensor Tower, Meatti, App Radar, Priori Data, ASOdesk, Searchman, TheTool, Keyword Tool, and AppKeywords.net are some good examples. 2) Review & Sentiment Analysis Tools ASO tools of this type perform optimization of your user reviews and ratings. Tools like Appbot, Mobile Action, Meatti and TheTool analyze your user ratings and review contents, and tell you what your users like and don’t like. With this kind of sentiment analysis, you can then refine your product development roadmap to earn better ratings. For example, if you find out a lot of users are complaining about a specific issue, you can prioritize your effort to fix that problem, and tell the complaining users about the solution. Many users will appreciate your positive reaction to their comments, and give you better ratings. Related to this, AppFollow provides features that help you reply all comments in App Store and Play Store efficiently. On the other hand, ASO tools like Apptentive help you increase the chance of getting 5-star reviews. It optimizes your app’s rating prompt process by deciding who, when, and how to present your rating prompts. 3) A/B Testing Tools A/B testing enables you to test your mobile app just like a science project. It helps you test two or more app product pages and determine which one gives you a better download conversion rate. Tools like Splitmetrics, Store Maven, TestNest, and RaiseMetrics are some good A/B testing tools for your app product page. 4) Search Ads Optimization Tools These ASO tools help you optimize your advertising campaign on Apple Search Ads. They provide automation features and competitor data that help you run ad campaigns more effectively. Some tools also integrate with app attribution partners (Adjust, AppsFlyer, Kochava, TUNE, etc.) and allows you to optimize campaigns not only for installs, but also for in-app events. ASO tools like SearchAdsHQ, SearchAds by Mobile Action, and adAhead are some good examples. 5) App Store Intelligence Tools ASO tools of this type provide estimates of competitor performance and app market trends. For instance, they offer estimates on data on competitor apps. These estimates include app downloads, revenue, advertising spend trends, market penetrations, etc. The information can be useful to app product managers and marketing managers for doing competitive analysis and marketing planning. App Annie, Mobile Action, Sensor Tower, Priori Data, Apptopia, and APPlyzer are ASO tools that offer app store intelligence. Top ASO Tools Listed below are the top ASO tools in 2018. The list is organized according to the types of ASO tools discussed above. To make the list more authentic, I personally reached out to everyone of them and collect their views of how their tools can help their users. And I’m fortunate enough to receive some great answers! Lastly, I’ve prepared an one-page comparison spreadsheet with all the ASO tools. It is a printable version, and you can easily sort the ASO tools by type, price, availability of free version, etc. 1) App Keyword Optimization Tools Mobile Action Meatti Sensor Tower App Radar Priori Data ASOdesk Searchman TheTool Keyword Tool AppKeywords.net 2) Review & Sentiment Analysis Tools Mobile Action Meatti TheTool Apptentive Appbot AppFollow 3) A/B Testing Tools SplitMetrics StoreMaven Raise Metrics TestNest 4) Search Ads Optimization Tools SearchAdsHQ SearchAds by Mobile Action adAhead 5) App Store Intelligence Tools App Annie Mobile Action Sensor Tower Priori Data Apptopia APPlyzer A Side-by-Side Comparison of ASO Tools One Page Comparison Spreadsheet of all ASO Tools Mobile Data Intelligence & Actionable Insights Mobile Action Mobile Action is an intuitive App Store Optimization tool and a data company providing actionable insights for their users. It provides its users with the most accurate data possible but that’s what every ASO tool claims to do. In fact the difference of Mobile Action is its dedicated customer success team that provides instant support across the entire globe 24/7. Mobile Action got into business as a ASO agency so we know a great deal of stuff regarding App Store Optimization and we build our tools from the perspective of an ASO specialist. Aykut Karaalioglu, CEO Mobile Action ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $69/Month Boost App Downloads using Artificial Intelligence Meatti Meatti helps mobile app developers boost app downloads without spending a penny on advertising. Our Meatti platform analyzes data from millions of apps every day. Using the data and artificial intelligence, it provides app developers with the best keyword and optimization suggestions to gain more app downloads in a systematic way. Marcus Kay, CEO Meatti ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $24/Month Data That Drives App Growth Sensor Tower Sensor Tower provides mobile developers with powerful market intelligence and App Store Optimization solutions that enable them to easily surface competitive insights and achieve maximum organic growth on the App Store and Google Play. Randy Nelson, Head of Mobile Insights Sensor Tower ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $79/Month App Store Optimization made easy App Radar App Radar is an search engine optimization tool that helps app developers optimizing their apps being more visible within the app stores. With a direct integration into iTunes Connect & Google Play Console, App Radar makes the process of App Store Optimization easy like never before. Thomas Kriebernegg, CEO App Radar ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $150/Month Win Your Mobile Market Priori Data Priori Data App Intelligence enables you to research, benchmark, and track your competition all in one place. Create individual or team viewable watchlists and comparisons of apps in your competitive set, and track their rank, download, revenue, DAU, MAU, ARPDAU and retention performance on a daily basis. Set up smart alerts to get notified of any major shifts, and receive daily and weekly reports so that you never lose track of the big picture. Priori Data ASO Tool - Quick Facts: Free version / trial available? No Premium plan starts at: $99/Month Boost your organic downloads with Data-Driven Marketing Technologies ASOdesk Our dream is to make our customers more and more successful. App Store Optimization is a never-ending optimization process that can bring millions of free installs. Our clients have many opportunities to make their business more effective. Just in a couple of clicks our product is available for you and ready to help you to find new real users. Sergey Sharov, CEO ASOdesk ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $41.6/Month App Data Solutions to accelerate Ecosystem success Searchman SearchMan is the leading App Analytics Data & Technology company with over 100 000 companies actively using our solutions to help them succeed in the App economy. SearchMan’s parent company, AppGrooves was founded in San Francisco Bay Area by former executives of Rakuten, AdMob, Yahoo and many other startups. Our investors include 500 Startups, Digital Garage, and several internet luminaries whose experience includes Disney, Google, Yahoo, Gree, Ricoh, Hatena, and Rakuten. Searchman ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $25/Month Performance-Based Mobile App Marketing & ASO tool TheTool TheTool helps developers and marketers to track and optimize their App Store Optimization strategy in 91 countries or globally, carry out keyword research, benchmark ASO KPIs with competitors, understand the impact of marketing actions on installs, conversion rate and revenue; and, ultimately, grow the organic installs of their apps and games. Basically we help people make more money with apps. Daniel Peris, CEO TheTool ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: €29 /Year Find Great Keywords Using Autocomplete Keyword Tool KeywordTool.io helps marketers and app creators discover what app store users are looking for by generating keyword suggestions using the app store’s autocomplete. A simple search can yield hundreds of hidden keywords for you to optimize your app towards. Khai Yong Ng, Head of Growth KeywordTool.io ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $48 /Year Sneak into Google Play's auto-suggest feature AppKeywords.net When I launched AppKeywords.net back in 2015 it was really hard to get proper data on keywords. Sure you had a lot tools giving you some kind of estimates but you could not be really sure if the data is accurate. Especially when you were researching non-english keywords. Sebastian Knopp, Growth and Product Strategy Appkeyword.net ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: Free Build a brand your customers love Apptentive Using proactive mobile communication tools, Apptentiveempowers companies to better understand more of their customers—at scale—in order to drive app downloads, create seamless customer experiences, and validate product roadmaps. The product gives brands the opportunity to listen to, engage with, and retain their customers through intelligently timed surveys, messages, and prompts. They power millions of customer interactions every month for companies including Buffalo Wild Wings, eBay, Philips, Saks Fifth Avenue, and Zillow. Robi Ganguly, CEO Apptentive ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: Custom Plan App review & ratings analysis for mobile teams Appbot Appbot helps developers understand how customers feel about their apps, by monitoring and analyzing their app reviews and ratings across all major platforms. Appbot applies proprietary sentiment analysis and clustering techniques to help developers understand current issues, and identify quick wins. Claire Mcgregor, Co-founder Appbot ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $39 /Month Reviews & Updates Monitor for App Store & Google Play AppFollow AppFollow is created to support everyone (this year we will expand this support even further) involved in the process of development and growth of mobile apps and games. We support everyone whether it is a developer, CEO, customer support or product manager, ASO expert or publisher. Anatoly Sharifulin, CEO & Co-founder AppFollow ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: Custom Plan Achieve success through apps App Annie The industry’s first app data platform integrates your app data with our comprehensive market data, cutting-edge data science, deep data foundation and engaging data experience. Through our platform, you can get immediate access to all our latest technology innovations and data sets, share the right data with the right people at the right time, pinpoint prime opportunities — and most crucially — create winning strategies. App Annie ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $15,000+/Year (source: TechCrunch) Grow Your App Business Apptopia Apptopia provides competitive intelligence for the mobile app economy. Through intuitive tools, we’re able to display actionable data. This means user acquisition managers, product teams, SDK sales teams, growth marketers and more can make smarter decisions faster. Data we provide includes downloads, revenue, usage, retention, rank, SDK data, audience intelligence, advertising intelligence and more. Adam Blacker, Communications Lead and Brand Ambassador Apptopia ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $55 /Month App Market Analysis & App Store Optimization APPlyzer Applyzer is a leading app industry analysis service providing market insights since 2009. Our service offers reliable data to a wide range of customers in the app business – From actionable data for publishers to relevant information for tech investors. Applyzer ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: €10 /Month Optimize Your App Conversion Rates on the App Store and Google Play with A/B Testing SplitMetrics With SplitMetrics, such app publishers as Rovio, Halfbrick, Wargaming, ZeptoLab, Pocket Gems optimize app store conversions by A/B testing app page elements: from icons and screenshots to subtitles, app previews, etc. To help publishers get the most out of their app marketing efforts, SplitMetrics shares industry benchmarks and a great volume of educational materials, such as an AppGrowthLab course. Alexandra Lamachenka, Head of Marketing SplitMetrics ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $4,999 /Year Increase app store conversion rates & pay less for every install StoreMaven StoreMaven helps more than 60% of top-grossing app publishers optimize their app store product pages to increase install rates and reduce the cost of user acquisition. Companies like Google, Uber, Facebook and Zynga rely on StoreMaven‘s testing and analytics platform to define their ASO and global mobile marketing strategies. Gad Maor, CEO StoreMaven ASO Tool - Quick Facts: Free version / trial available? No Premium plan starts at: Custom Plan Raise your App Store & Google Play install rates with A/B testing RaiseMetrics Insight Is Everything. RaiseMetrics provides a visual understanding of how your audience interacts with your app page, and what you can do to maximize conversions. RaiseMetrics ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $99 /Month Best Self-serve App Store and Google Play AB Testing Platform TestNest Best self-serve app store and google play ab testing platform. Unoptimized App Store pages may increase CPIs by up to 40%. A/B test your app listing pages and get more quality users for less. Learn from user behavior analysis make optimized data-driven decisions. TestNest ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: $149 /Month Optimize Apple App Store Ads for Revenue, not just downloads SearchAdsHQ SearchAdsHQ helps app publishers run ROI-driven Apple Search Ads campaigns. To make it possible, the platform connects Apple Search Ads with app attribution partners (Adjust, AppsFlyer, Kochava, TUNE, etc.) and allows to optimize campaigns not only for installs, but for in-app events: in-app purchases, subscriptions, conversions. Alexandra Lamachenka, Head of Marketing SearchAdsHQ ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: Custom Plan Mobile Action provides awesome tools to make the most of your Search Ads and keep up with the competition. SearchAds by Mobile Action Searchads.com was created specifically for Apple Search Ads and as Apple Search Ads is a rather new service it tries to cover the shortcomings of Apple Search Ads by providing competitor data, more reactive notifications and automation features that allows users to get the most out of the time and resources they have spent in Apple Search Ads. Aykut Karaalioglu, CEO SearchAds by Mobile Action ASO Tool - Quick Facts: Free version / trial available? No Premium plan starts at: Custom Plan Optimize Apple App Store Ads for Revenue, not just downloads adAhead adAhead is an Apple Search Ads Optimization Platform that is fully self-managed by mobile app marketers. It provides GEO reports, COHORT analysis, keyword reports and charts, powerful rule manager tool, keyword rank monitoring, custom ad scheduler, bulk edit, duplication option for campaigns/ad groups, and multi account dashboard. adAhead also provides fully featured live demo for new visitors. Yury Listapad, CEO adAhead ASO Tool - Quick Facts: Free version / trial available? Yes Premium plan starts at: 2.5% of ad spend A Side-by-Side Comparison of ASO Tools The list of ASO tools here is really long. To make it easier to do comparison, I've prepared an one-page comparison spreadsheet with all the ASO tools for you. It is a printable version, and you can easily sort the ASO tools by type, price, availability of free version, etc. Click here to get a free comparison spreadsheet of all top ASO tools. The spreadsheet also includes 2 bonus tools and additional details that I didn’t have room to include in this post. This post originally appeared on Meatti Marcus Kay Marcus is the founder of Meatti - a platform that helps mobile game developers boost app downloads using artificial intelligence. Find him on Twitter, LinkedIn and his blog.
  5. Ruslan Sibgatullin

    How I halved apk size

    Originally posted on Medium You coded your game so hard for several months (or even years), your artist made a lot of high-quality assets, and the game is finally ready to be launched. Congratulation! You did a great job. Now take a look at the apk size and be prepared to be scared. What is the size — 60, 70 or even 80 megabytes? As it might be sounds strange to hear (in the era of 128GB smartphones) but I have some bad news — the size it too big. That’s exactly what happened to me after I’ve finished the game Totem Spirits. In this article I want to share several advises about how to reduce the size of a release apk file and yet not lose the quality. Please, note, that for development I used quite popular game development engine Libgdx, but tips below should be applicable for other frameworks as well. Moreover, my case is about rather simple 2D game with a lot of sprites (i.e. images), so it might be not that useful for large 3D products. To keep you motivated to read this article further I want to share the final result: I managed to halve the apk size — from 64MB to 32.36MB. Memory management The very first thing that needs to be done properly is a memory management. You should always have only necessary objects loaded into the memory and release resources once they are not in use. This topic requires a lot of details, so I’d rather cover it in a separate article. Next, I want to analyze the size of current apk file. As for my game I have four different types of game resources: 1. Intro — the resources for intro screen. Intro background Loaded before the game starts, disposed immediately after the loading is done. (~0.5MB) 2. In menu resources — used in menu only (location backgrounds, buttons, etc). Loaded during the intro stage and when a player exits a game level. Disposed during “in game resources” loading. (~7.5MB images + ~5.4MB music) 3. In game resources — used on game levels only (objects, game backgrounds, etc.). Loaded during a game level loading, disposed when a player exits the game level. Note, that those resources are not disposed when a player navigates between levels (~4.5MB images + ~10MB music) 4. Common — used in all three above. Loaded during the intro stage, disposed only once the game is closed. This one also includes fonts. (~1.5MB). The summed size of all resources is ~30MB, so we can conclude that the size of apk is basically the size of all its assets. The code base is only ~3MB. That’s why I want to focus on the assets in the first place (still, the code will be discussed too). Images optimization The first thing to do is to make the size of images smaller while not harming the quality. Fortunately, there are plenty services that offer exactly this. I used this one. This resulted in 18MB reduction already! Compare the two images below: Not optimized Optimized the sizes are 312KB and 76KB respectively, so the optimized image is 4 times smaller! But a human eye can’t notice the difference. Images combination You should combine the same images programmatically rather than having almost the same images (especially if they are quite big). Consider the following example: Before After God of Fire God of Water Rather than having four full-size images with different Gods but same background I have only one big background image and four smaller images of Gods that are then combined programmatically into one image. Although, the reduction is not so big (~2MB) for some cases it can make a difference. Images format I consider this as my biggest mistake so far. I had several images without transparency saved in PNG format. The JPG version of those images is 6 times more lightweight! Once I transformed all images without transparency into JPG the apk size became 5MB smaller. Music optimization At first the music quality was 256 kbps. Then I reduced it to 128 kbps and saved 5MB more. Still think that tracks can be compressed even more. Please, share in comments if you ever used 64 kbps in your games. Texture Packs This item might be a bit Libgdx-specific, although I think similar functionality should exist in other engines as well. Texture pack is a way to organize a bunch of images into one big pack. Then, in code you treat each pack as one unit, so it’s quite handy for memory management. But you should combine images wisely. As for my game, at first I had resources packed quite badly. Then, I separated all transparent and non-transparent images and gained about 5MB more. Dependencies and Optimal code base Now let’s see the other side of development process — coding. I will not dive into too many details about the code-writing here (since it deserves separate article as well). But still want to share some general rules that I believe could be applied to any project. The most important thing is to reduce the quantity of 3d party dependencies in the project. Do you really need to add Apache Commons if you use only one method from StringUtils? Or gson if you just don’t like the built-in json functionality? Well, you do not. I used Libgdx as a game development engine and quite happy with it. Quite sure that for the next game I’ll use this engine again. Oh, do I need to say that you should have the code to be written the most optimal way? :) Well, I mentioned it. Although, the most of the tips I’ve shared here can be applied at the late development stage, some of them (especially, optimization of memory management) should be designed right from the very beginning of a project. Stay tuned for more programming articles!
  6. Performance is everybody's responsibility, no matter what your role. When it comes to the GPU, 3D programmers have a lot of control over performance; we can optimize shaders, trade image quality for performance, use smarter rendering techniques... we have plenty of tricks up our sleeves. But there's one thing we don't have direct control over, and that's the game's art. We rely on artists to produce assets that not only look good but are also efficient to render. For artists, a little knowledge of what goes on under the hood can make a big impact on a game's framerate. If you're an artist and want to understand why things like draw calls, LODs, and mipmaps are important for performance, read on! To appreciate the impact that your art has on the game's performance, you need to know how a mesh makes its way from your modelling package onto the screen in the game. That means having an understanding of the GPU - the chip that powers your graphics card and makes real-time 3D rendering possible in the first place. Armed with that knowledge, we'll look at some common art-related performance issues, why they're a problem, and what you can do about it. Things are quickly going to get pretty technical, but if anything is unclear I'll be more than happy to answer questions in the comments section. Before we start, I should point out that I am going to deliberately simplify a lot of things for the sake of brevity and clarity. In many cases I'm generalizing, describing only the typical case, or just straight up leaving things out. In particular, for the sake of simplicity the idealized version of the GPU I describe below more closely matches that of the previous (DX9-era) generation. However when it comes to performance, all of the considerations below still apply to the latest PC & console hardware (although not necessarily all mobile GPUs). Once you understand everything described here, it will be much easier to get to grips with the variations and complexities you'll encounter later, if and when you start to dig deeper. Part 1: The rendering pipeline from 10,000 feet For a mesh to be displayed on the screen, it must pass through the GPU to be processed and rendered. Conceptually, this path is very simple: the mesh is loaded, vertices are grouped together as triangles, the triangles are converted into pixels, each pixel is given a colour, and that's the final image. Let's look a little closer at what happens at each stage. After you export a mesh from your DCC tool of choice (Digital Content Creation - Maya, Max, etc.), the geometry is typically loaded into the game engine in two pieces; a Vertex Buffer (VB) that contains a list of the mesh's vertices and their associated properties (position, UV coordinates, normal, color etc.), and an Index Buffer (IB) that lists which vertices in the VB are connected to form triangles. Along with these geometry buffers, the mesh will also have been assigned a material to determine what it looks like and how it behaves under different lighting conditions. To the GPU this material takes the form of custom-written shaders - programs that determine how the vertices are processed, and what colour the resulting pixels will be. When choosing the material for the mesh, you will have set various material parameters (eg. setting a base color value or picking a texture for various maps like albedo, roughness, normal etc.) - these are passed to the shader programs as inputs. The mesh and material data get processed by various stages of the GPU pipeline in order to produce pixels in the final render target (an image to which the GPU writes). That render target can then be used as a texture in subsequent shader programs and/or displayed on screen as the final image for the frame. For the purposes of this article, here are the important parts of the GPU pipeline from top to bottom: Input Assembly. The GPU reads the vertex and index buffers from memory, determines how the vertices are connected to form triangles, and feeds the rest of the pipeline. Vertex Shading. The vertex shader gets executed once for every vertex in the mesh, running on a single vertex at a time. Its main purpose is to transform the vertex, taking its position and using the current camera and viewport settings to calculate where it will end up on the screen. Rasterization. Once the vertex shader has been run on each vertex of a triangle and the GPU knows where it will appear on screen, the triangle is rasterized - converted into a collection of individual pixels. Per-vertex values - UV coordinates, vertex color, normal, etc. - are interpolated across the triangle's pixels. So if one vertex of a triangle has a black vertex color and another has white, a pixel rasterized in the middle of the two will get the interpolated vertex color grey. Pixel Shading. Each rasterized pixel is then run through the pixel shader (although technically at this stage it's not yet a pixel but 'fragment', which is why you'll see the pixel shader sometimes called a fragment shader). This gives the pixel a color by combining material properties, textures, lights, and other parameters in the programmed way to get a particular look. Since there are so many pixels (a 1080p render target has over two million) and each one needs to be shaded at least once, the pixel shader is usually where the GPU spends a lot of its time. Render Target Output. Finally the pixel is written to the render target - but not before undergoing some tests to make sure it's valid. For example in normal rendering you want closer objects to appear in front of farther objects; the depth test can reject pixels that are further away than the pixel already in the render target. But if the pixel passes all the tests (depth, alpha, stencil etc.), it gets written to the render target in memory. There's much more to it, but that's the basic flow: the vertex shader is executed on each vertex in the mesh, each 3-vertex triangle is rasterized into pixels, the pixel shader is executed on each rasterized pixel, and the resulting colors are written to a render target. Under the hood, the shader programs that represent the material are written in a shader programming language such as HLSL. These shaders run on the GPU in much the same way that regular programs run on the CPU - taking in data, running a bunch of simple instructions to change the data, and outputting the result. But while CPU programs are generalized to work on any type of data, shader programs are specifically designed to work on vertices and pixels. These programs are written to give the rendered object the look of the desired material - plastic, metal, velvet, leather, etc. To give you a concrete example, here's a simple pixel shader that does Lambertian lighting (ie. simple diffuse-only, no specular highlights) with a material color and a texture. As shaders go it's one of the most basic, but you don't need to understand it - it just helps to see what shaders can look like in general. float3 MaterialColor; Texture2D MaterialTexture; SamplerState TexSampler; float3 LightDirection; float3 LightColor; float4 MyPixelShader( float2 vUV : TEXCOORD0, float3 vNorm : NORMAL0 ) : SV_Target { float3 vertexNormal = normalize(vNorm); float3 lighting = LightColor * dot( vertexNormal, LightDirection ); float3 material = MaterialColor * MaterialTexture.Sample( TexSampler, vUV ).rgb; float3 color = material * lighting; float alpha = 1; return float4(color, alpha); } A simple pixel shader that does basic lighting. The inputs at the top like MaterialTexture and LightColor are filled in by the CPU, while vUV and vNorm are both vertex properties that were interpolated across the triangle during rasterization. And the generated shader instructions: dp3 r0.x, v1.xyzx, v1.xyzx rsq r0.x, r0.x mul r0.xyz, r0.xxxx, v1.xyzx dp3 r0.x, r0.xyzx, cb0[1].xyzx mul r0.xyz, r0.xxxx, cb0[2].xyzx sample_indexable(texture2d)(float,float,float,float) r1.xyz, v0.xyxx, t0.xyzw, s0 mul r1.xyz, r1.xyzx, cb0[0].xyzx mul o0.xyz, r0.xyzx, r1.xyzx mov o0.w, l(1.000000) ret The shader compiler takes the above program and generates these instructions which are run on the GPU; a longer program produces more instructions which means more work for the GPU to do. As an aside, you might notice how isolated the shader steps are - each shader works on a single vertex or pixel without needing to know anything about the surrounding vertices/pixels. This is intentional and allows the GPU to process huge numbers of independent vertices and pixels in parallel, which is part of what makes GPUs so fast at doing graphics work compared to CPUs. We'll return to the pipeline shortly to see where things might slow down, but first we need to back up a bit and look at how the mesh and material got to the GPU in the first place. This is also where we meet our first performance hurdle - the draw call. The CPU and Draw Calls The GPU cannot work alone; it relies on the game code running on the machine's main processor - the CPU - to tell it what to render and how. The CPU and GPU are (usually) separate chips, running independently and in parallel. To hit our target frame rate - most commonly 30 frames per second - both the CPU and GPU have to do all the work to produce a single frame within the time allowed (at 30fps that's just 33 milliseconds per frame). To achieve this, frames are often pipelined; the CPU will take the whole frame to do its work (process AI, physics, input, animation etc.) and then send instructions to the GPU at the end of the frame so it can get to work on the next frame. This gives each processor a full 33ms to do its work at the expense of introducing a frame's worth of latency (delay). This may be an issue for extremely time-sensitive twitchy games like first person shooters - the Call of Duty series for example runs at 60fps to reduce the latency between player input and rendering - but in general the extra frame is not noticeable to the player. Every 33ms the final render target is copied and displayed on the screen at VSync - the interval during which the monitor looks for a new frame to display. But if the GPU takes longer than 33ms to finish rendering the frame, it will miss this window of opportunity and the monitor won't have any new frame to display. That results in either screen tearing or stuttering and an uneven framerate that we really want to avoid. We also get the same result if the CPU takes too long - it has a knock-on effect since the GPU doesn't get commands quickly enough to do its job in the time allowed. In short, a solid framerate relies on both the CPU and GPU performing well. Here the CPU takes too long to produce rendering commands for the second frame, so the GPU starts rendering late and thus misses VSync. To display a mesh, the CPU issues a draw call which is simply a series of commands that tells the GPU what to draw and how to draw it. As the draw call goes through the GPU pipeline, it uses the various configurable settings specified in the draw call - mostly determined by the mesh's material and its parameters - to decide how the mesh is rendered. These settings, called GPU state, affect all aspects of rendering, and consist of everything the GPU needs to know in order to render an object. Most significantly for us, GPU state includes the current vertex/index buffers, the current vertex/pixel shader programs, and all the shader inputs (eg. MaterialTexture or LightColor in the above shader code example). This means that to change a piece of GPU state (for example changing a texture or switching shaders), a new draw call must be issued. This matters because these draw calls are not free for the CPU. It costs a certain amount of time to set up the desired GPU state changes and then issue the draw call. Beyond whatever work the game engine needs to do for each call, extra error checking and bookkeeping cost is introduced by the graphics driver, an intermediate layer of code written by the GPU vendor (NVIDIA, AMD etc.) that translates the draw call into low-level hardware instructions. Too many draw calls can put too much of a burden on the CPU and cause serious performance problems. Due to this overhead, we generally set an upper limit to the number of draw calls that are acceptable per frame. If this limit is exceeded during gameplay testing, steps must be taken such as reducing the number of objects, reducing draw distance, etc. Console games will typically try to keep draw calls in the 2000-3000 range (eg. on Far Cry Primal we tried to keep it below 2500 per frame). That might sound like a lot, but it also includes any special rendering techniques that might be employed - cascaded shadows for example can easily double the number of draw calls in a frame. As mentioned above, GPU state can only be changed by issuing a new draw call. This means that although you may have created a single mesh in your modelling package, if one half of the mesh uses one texture for the albedo map and the other half uses a different texture, it will be rendered as two separate draw calls. The same goes if the mesh is made up of multiple materials; different shaders need to be set, so multiple draw calls must be issued. In practice, a very common source of state change - and therefore extra draw calls - is switching texture maps. Typically the whole mesh will use the same material (and therefore the same shaders), but different parts of the mesh will use different sets of albedo/normal/roughness maps. With a scene of hundreds or even thousands of objects, using many draw calls for each object will cost a considerable amount of CPU time and so will have a noticeable impact on the framerate of the game. To avoid this, a common solution is to combine all the different texture maps used on a mesh into a single big texture, often called an atlas. The UVs of the mesh are then adjusted to look up the right part of the atlas, and the entire mesh (or even multiple meshes) can be rendered in a single draw call. Care must be taken when constructing the atlas so that adjacent textures don't bleed into each other at lower mips, but these problems are relatively minor compared to the gains that can be had in terms of performance. A texture atlas from Unreal Engine's Infiltrator demo Many engines also support instancing, also known as batching or clustering. This is the ability to use a single draw call to render multiple objects that are mostly identical in terms of shaders and state, and only differ in a restricted set of ways (typically their position and rotation in the world). The engine will usually recognize when multiple identical objects can be rendered using instancing, so it's always preferable to use the same object multiple times in a scene when possible, instead of multiple different objects that will need to be rendered with separate draw calls. Another common technique for reducing draw calls is manually merging many different objects that share the same material into a single mesh. This can be effective, but care must be taken to avoid excessive merging which can actually worsen performance by increasing the amount of work for the GPU. Before any draw call gets issued, the engine's visibility system will determine whether or not the object will even appear on screen. If not, it's very cheap to just ignore the object at this early stage and not pay for any draw call or GPU work (also known as visibility culling). This is usually done by checking if the object's bounding volume is visible from the camera's point of view, and that it is not completely blocked from view (occluded) by any other objects. However, when multiple meshes are merged into a single object, their individual bounding volumes must be combined into a single large volume that is big enough to enclose every mesh. This increases the likelihood that the visibility system will be able to see some part of the volume, and so will consider the entire collection visible. That means that it becomes a draw call, and so the vertex shader must be executed on every vertex in the object - even if very few of those vertices actually appear on the screen. This can lead to a lot of GPU time being wasted because the vertices end up not contributing anything to the final image. For these reasons, mesh merging is the most effective when it is done on groups of small objects that are close to each other, as they will probably be on-screen at the same time anyway. A frame from XCOM 2 as captured with RenderDoc. The wireframe (bottom) shows in grey all the extra geometry submitted to the GPU that is outside the view of the in-game camera. As an illustrative example take the above capture of XCOM 2, one of my favourite games of the last couple of years. The wireframe shows the entire scene as submitted to the GPU by the engine, with the black area in the middle being the geometry that's actually visible by the game camera. All the surrounding geometry in grey is not visible and will be culled after the vertex shader is executed, which is all wasted GPU time. In particular, note the highlighted red geometry which is a series of bush meshes, combined and rendered in just a few draw calls. Since the visibility system determined that at least some of the bushes are visible on the screen, they are all rendered and so must all have their vertex shader executed before determining which can be culled... which turns out to be most of them. Please note this isn't an indictment of XCOM 2 in particular, I just happened to be playing it while writing this article! Every game has this problem, and it's a constant battle to balance the CPU cost of doing more accurate visibility tests, the GPU cost of culling the invisible geometry, and the CPU cost of having more draw calls. Things are changing when it comes to the cost of draw calls however. As mentioned above, a significant reason for their expense is the overhead of the driver doing translation and error checking. This has long been the case, but the most modern graphics APIs (eg. Direct3D 12 and Vulkan) have been restructured in order to avoid most of this overhead. While this does introduce extra complexity to the game's rendering engine, it can also result in cheaper draw calls, allowing us to render many more objects than before possible. Some engines (most notably the latest version used by Assassin's Creed) have even gone in a radically different direction, using the capabilities of the latest GPUs to drive rendering and effectively doing away with draw calls altogether. The performance impact of having too many draw calls is mostly on the CPU; pretty much all other performance issues related to art assets are on the GPU. We'll now look at what a bottleneck is, where they can happen, and what we can do about them. Part 2: Common GPU bottlenecks The very first step in optimization is to identify the current bottleneck so you can take steps to reduce or eliminate it. A bottleneck refers to the section of the pipeline that is slowing everything else down. In the above case where too many draw calls are costing too much, the CPU is the bottleneck. Even if we performed other optimizations that made the GPU faster, it wouldn't matter to the framerate because the CPU is still running too slowly to produce a frame in the required amount of time. 4 draw calls going through the pipeline, each being the rendering of a full mesh containing many triangles. The stages overlap because as soon as one piece of work is finished it can be immediately passed to the next stage (eg. when three vertices are processed by the vertex shader then the triangle can proceed to be rasterized). You can think of the GPU pipeline as an assembly line. As each stage finishes with its data, it forwards the results to the following stage and proceeds with the next piece of work. Ideally every stage is busy working all the time, and the hardware is being utilized fully and efficiently as represented in the above image - the vertex shader is constantly processing vertices, the rasterizer is constantly rasterizing pixels, and so on. But consider what happens if one stage takes much longer than the others: What happens here is that an expensive vertex shader can't feed the following stages fast enough, and so becomes the bottleneck. If you had a draw call that behaved like this, making the pixel shader faster is not going to make much of a difference to the time it takes for the entire draw call to be rendered. The only way to make things faster is to reduce the time spent in the vertex shader. How we do that depends on what in the vertex shader stage is actually causing the bottleneck. You should keep in mind that there will almost always be a bottleneck of some kind - if you eliminate one, another will just take its place. The trick is knowing when you can do something about it, and when you have to live with it because that's just what it costs to render what you want to render. When you optimize, you're really trying to get rid of unnecessary bottlenecks. But how do you identify what the bottleneck is? Profiling Profiling tools are absolutely essential for figuring out where all the GPU's time is being spent, and good ones will point you at exactly what you need to change in order for things to go faster. They do this in a variety of ways - some explicitly show a list of bottlenecks, others let you run 'experiments' to see what happens (eg. "how does my draw time change if all the textures are tiny", which can tell you if you're bound by memory bandwidth or cache usage). Unfortunately this is where things get a bit hand-wavy, because some of the best performance tools available are only available for the consoles and therefore under NDA. If you're developing for Xbox or Playstation, bug your friendly neighbourhood graphics programmer to show you these tools. We love it when artists get involved in performance, and will be happy to answer questions and even host tutorials on how to use the tools effectively. Unity's basic built-in GPU profiler The PC already has some pretty good (albeit hardware-specific) profiling tools which you can get directly from the GPU vendors, such as NVIDIA's Nsight, AMD's GPU PerfStudio, and Intel's GPA. Then there's RenderDoc which is currently the best tool for graphics debugging on PC, but doesn't have any advanced profiling features. Microsoft is also starting to release its awesome Xbox profiling tool PIX for Windows too, albeit only for D3D12 applications. Assuming they also plan to provide the same bottleneck analysis tools as the Xbox version (tricky with the wide variety of hardware out there), it should be a huge asset to PC developers going forward. These tools can give you more information about the performance of your art than you will ever need. They can also give you a lot of insight into how a frame is put together in your engine, as well as being awesome debugging tools for when things don't look how they should. Being able to use them is important, as artists need to be responsible for the performance of their art. But you shouldn't be expected to figure it all out on your own - any good engine should provide its own custom tools for analyzing performance, ideally providing metrics and guidelines to help determine if your art assets are within budget. If you want to be more involved with performance but feel you don't have the necessary tools, talk to your programming team. Chances are they already exist - and if they don't, they should be created! Now that you know how GPUs work and what a bottleneck is, we can finally get to the good stuff. Let's dig into the most common real-world bottlenecks that can show up in the pipeline, how they happen, and what can be done about them. Shader instructions Since most of the GPU's work is done with shaders, they're often the source of many bottlenecks of the you'll see. When a bottleneck is identified as shader instructions (sometimes referred to as ALUs from Arithmetic Logic Units, the hardware that actually does the calculations), it's simply a way of saying the vertex or pixel shader is doing a lot of work and the rest of the pipeline is waiting for that work to finish. Often the vertex or pixel shader program itself is just too complex, containing many instructions and taking a long time to execute. Or maybe the vertex shader is reasonable but the mesh you're rendering has too many vertices which adds up to a lot of time spent executing the vertex shader. Or the draw call covers a large area of the screen touching many pixels, and so spends a lot of time in the pixel shader. Unsurprisingly, the best way to optimize a shader instruction bottleneck is to execute less instructions! For pixel shaders that means choosing a simpler material with less features to reduce the number of instructions executed per pixel. For vertex shaders it means simplifying your mesh to reduce the number of vertices that need to be processed, as well as being sure to use LODs (Level Of Detail - simplified versions of your mesh for use when the object is far away and small on the screen). Sometimes however, shader instruction bottlenecks are instead just an indication of problems in some other area. Issues such as too much overdraw, a misbehaving LOD system, and many others can cause the GPU to do a lot more work than necessary. These problems can be either on the engine side or the content side; careful profiling, examination, and experience will help you to figure out what's really going on. One of the most common of these issues - overdraw - is when the same pixel on the screen needs to be shaded multiple times, because it's touched by multiple draw calls. Overdraw is a problem because it decreases the overall time the GPU has to spend on rendering. If every pixel on the screen has to be shaded twice, the GPU can only spend half the amount of time on each pixel and still maintain the same framerate. A frame capture from PIX with the corresponding overdraw visualization mode Sometimes overdraw is unavoidable, such as when rendering translucent objects like particles or glass-like materials; the background object is visible through the foreground, so both need to be rendered. But for opaque objects, overdraw is completely unnecessary because the pixel shown in the buffer at the end of rendering is the only one that actually needs to be processed. In this case, every overdrawn pixel is just wasted GPU time. Steps are taken by the GPU to reduce overdraw in opaque objects. The early depth test (which happens before the pixel shader - see the initial pipeline diagram) will skip pixel shading if it determines that the pixel will be hidden by another object. It does that by comparing the pixel being shaded to the depth buffer - a render target where the GPU stores the entire frame's depth so that objects occlude each other properly. But for the early depth test to be effective, the other object must have already been rendered so it is present in the depth buffer. That means that the rendering order of objects is very important. Ideally every scene would be rendered front-to-back (ie. objects closest to the camera first), so that only the foreground pixels get shaded and the rest get killed by the early depth test, eliminating overdraw entirely. But in the real world that's not always possible because you can't reorder the triangles inside a draw call during rendering. Complex meshes can occlude themselves multiple times, or mesh merging can result in many overlapping objects being rendered in the "wrong" order causing overdraw. There's no easy answer for avoiding these cases, and in the latter case it's just another thing to take into consideration when deciding whether or not to merge meshes. To help early depth testing, some games do a partial depth prepass. This is a preliminary pass where certain large objects that are known to be effective occluders (large buildings, terrain, the main character etc.) are rendered with a simple shader that only outputs to the depth buffer, which is relatively fast as it avoids doing any pixel shader work such as lighting or texturing. This 'primes' the depth buffer and increases the amount of pixel shader work that can be skipped during the full rendering pass later in the frame. The drawback is that rendering the occluding objects twice (once in the depth-only pass and once in the main pass) increases the number of draw calls, plus there's always a chance that the time it takes to render the depth pass itself is more than the time it saves from increased early depth test efficiency. Only profiling in a variety of cases can determine whether or not it's worth it for any given scene. Particle overdraw visualization of an explosion in Prototype 2 One place where overdraw is a particular concern is particle rendering, given that particles are transparent and often overlap a lot. Artists working on particle effects should always have overdraw in mind when producing effects. A dense cloud effect can be produced by emitting lots of small faint overlapping particles, but that's going to drive up the rendering cost of the effect; a better-performing alternative would be to emit fewer large particles, and instead rely more on the texture and texture animation to convey the density of the effect. The overall result is often more visually effective anyway because offline software like FumeFX and Houdini can usually produce much more interesting effects through texture animation, compared to real-time simulated behaviour of individual particles. The engine can also take steps to avoid doing more GPU work than necessary for particles. Every rendered pixel that ends up completely transparent is just wasted time, so a common optimization is to perform particle trimming: instead of rendering the particle with two triangles, a custom-fitted polygon is generated that minimizes the empty areas of the texture that are used. Particle 'cutout' tool in Unreal Engine 4 The same can be done for other partially transparent objects such as vegetation. In fact for vegetation it's even more important to use custom geometry to eliminate the large amount of empty texture space, as vegetation often uses alpha testing. This is when the alpha channel of the texture is used to decide whether or not to discard the pixel during the pixel shader stage, effectively making it transparent. This is a problem because alpha testing can also have the side effect of disabling the early depth test completely (because it invalidates certain assumptions that the GPU can make about the pixel), leading to much more unnecessary pixel shader work. Combine this with the fact that vegetation often contains a lot of overdraw anyway - think of all the overlapping leaves on a tree - and it can quickly become very expensive to render if you're not careful. A close relative of overdraw is overshading, which is caused by tiny or thin triangles and can really hurt performance by wasting a significant portion of the GPU's time. Overshading is a consequence of how GPUs process pixels during pixel shading: not one at a time, but instead in 'quads' which are blocks of four pixels arranged in a 2x2 pattern. It's done like this so the hardware can do things like comparing UVs between pixels to calculate appropriate mipmap levels. This means that if a triangle only touches a single pixel of a quad (because the triangle is tiny or very thin), the GPU still processes the whole quad and just throws away the other three pixels, wasting 75% of the work. That wasted time can really add up, and is particularly painful for forward (ie. not deferred) renderers that do all lighting and shading in a single pass in the pixel shader. This penalty can be reduced by using properly-tuned LODs; besides saving on vertex shader processing, they can also greatly reduce overshading by having triangles cover more of each quad on average.' A 10x8 pixel buffer with 5x4 quads. The two triangles have poor quad utilization -- left is too small, right is too thin. The 10 red quads touched by the triangles need to be completely shaded, even though the 12 green pixels are the only ones that are actually needed. Overall, 70% of the GPU's work is wasted. (Random trivia: quad overshading is also the reason you'll sometimes see fullscreen post effects use a single large triangle to cover the screen instead of two back-to-back triangles. With two triangles, quads that straddle the shared edge would be wasting some of their work, so avoiding that saves a minor amount of GPU time.) Beyond overshading, tiny triangles are also a problem because GPUs can only process and rasterize triangles at a certain rate, which is usually relatively low compared to how many pixels it can process in the same amount of time. With too many small triangles, it can't produce pixels fast enough to keep the shader units busy, resulting in stalls and idle time - the real enemy of GPU performance. Similarly, long thin triangles are bad for performance for another reason beyond quad usage: GPUs rasterize pixels in square or rectangular blocks, not in long strips. Compared to a more regular-shaped triangle with even sides, a long thin triangle ends up making the GPU do a lot of extra unnecessary work to rasterize it into pixels, potentially causing a bottleneck at the rasterization stage. This is why it's usually recommended that meshes are tessellated into evenly-shaped triangles, even if it increases the polygon count a bit. As with everything else, experimentation and profiling will show the best balance. Memory Bandwidth and Textures As illustrated in the above diagram of the GPU pipeline, meshes and textures are stored in memory that is physically separate from the GPU's shader processors. That means that whenever the GPU needs to access some piece of data, like a texture being fetched by a pixel shader, it needs to retrieve it from memory before it can actually use it as part of its calculations. Memory accesses are analogous to downloading files from the internet. File downloads take a certain amount of time due to the internet connection's bandwidth - the speed at which data can be transferred. That bandwidth is also shared between all downloads - if you can download one file at 6MB/s, two files only download at 3MB/s each. The same is true of memory accesses; index/vertex buffers and textures being accessed by the GPU take time, and must share memory bandwidth. The speeds are obviously much higher than internet connections - on paper the PS4's GPU memory bandwidth is 176GB/s - but the idea is the same. A shader that accesses many textures will rely heavily on having enough bandwidth to transfer all the data it needs in the time it needs it. Shaders programs are executed by the GPU with these restrictions in mind. A shader that needs to access a texture will try to start the transfer as early as possible, then do other unrelated work (for example lighting calculations) and hope that the texture data has arrived from memory by the time it gets to the part of the program that needs it. If the data hasn't arrived in time - because the transfer is slowed down by lots of other transfers, or because it runs out of other work to do (especially likely for dependent texture fetches) - execution will stop and it will just sit there and wait. This is a memory bandwidth bottleneck; making the rest of the shader faster will not matter if it still needs to stop and wait for data to arrive from memory. The only way to optimize this is to reduce the amount of bandwidth being used, or the amount of data being transferred, or both. Memory bandwidth might even have to be shared with the CPU or async compute work that the GPU is doing at the same time. It's a very precious resource. The majority of memory bandwidth is usually taken up by texture transfers, since textures contain so much data. As a result, there are a few different mechanisms in place to reduce the amount of texture data that needs to be shuffled around. Memory bandwidth might even have to be shared with the CPU or async compute work that the GPU is doing at the same time. It's a very precious resource. The majority of memory bandwidth is usually taken up by texture transfers, since textures contain so much data. As a result, there are a few different mechanisms in place to reduce the amount of texture data that needs to be shuffled around. First and foremost is a cache. This is a small piece of high-speed memory that the GPU has very fast access to, and is used to keep chunks of memory that have been accessed recently in case the GPU needs them again. In the internet connection analogy, the cache is your computer's hard drive that stores the downloaded files for faster access in the future. When a piece of memory is accessed, like a single texel in a texture, the surrounding texels are also pulled into the cache in the same memory transfer. The next time the GPU looks for one of those texels, it doesn't need to go all the way to memory and can instead fetch it from the cache extremely quickly. This is actually often the common case - when a texel is displayed on the screen in one pixel, it's very likely that the pixel beside it will need to show the same texel, or the texel right beside it in the texture. When that happens, nothing needs to be transferred from memory, no bandwidth is used, and the GPU can access the cached data almost instantly. Caches are therefore vitally important for avoiding memory-related bottlenecks. Especially when you take filtering into account - bilinear, trilinear, and anisotropic filtering all require multiple texels to be accessed for each lookup, putting an extra burden on bandwidth usage. High-quality anisotropic filtering is particularly bandwidth-intensive. Now think about what happens in the cache if you try to display a large texture (eg. 2048x2048) on an object that's very far away and only takes up a few pixels on the screen. Each pixel will need to fetch from a very different part of the texture, and the cache will be completely ineffective since it only keeps texels that were close to previous accesses. Every texture access will try to find its result in the cache and fail (called a 'cache miss') and so the data must be fetched from memory, incurring the dual costs of bandwidth usage and the time it takes for the data to be transferred. A stall may occur, slowing the whole shader down. It will also cause other (potentially useful) data to be 'evicted' from the cache in order to make room for the surrounding texels that will never even be used, reducing the overall efficiency of the cache. It's bad news all around, and that's not to even mention the visual quality issues - tiny movements of the camera will cause completely different texels to be sampled, causing aliasing and sparkling. This is where mipmapping comes to the rescue. When a texture fetch is issued, the GPU can analyze the texture coordinates being used at each pixel, determining when there is a large gap between texture accesses. Instead of incurring the costs of a cache miss for every texel, it instead accesses a lower mip of the texture that matches the resolution it's looking for. This greatly increases the effectiveness of the cache, reducing memory bandwidth usage and the potential for a bandwidth-related bottleneck. Lower mips are also smaller and need less data to be transferred from memory, further reducing bandwidth usage. And finally, since mips are pre-filtered, their use also vastly reduces aliasing and sparkling. For all of these reasons, it's almost always a good idea to use mipmaps - the advantages are definitely worth the extra memory usage. A texture on two quads, one close to the camera and one much further away The same texture with a corresponding mipmap chain, each mip being half the size of the previous one Lastly, texture compression is an important way of reducing bandwidth and cache usage (in addition to the obvious memory savings from storing less texture data). Using BC (Block Compression, previously known as DXT compression), textures can be reduced to a quarter or even a sixth of their original size in exchange for a minor hit in quality. This is a significant reduction in the amount of data that needs to be transferred and processed, and most GPUs even keep the textures compressed in the cache, leaving more room to store other texture data and increasing overall cache efficiency. All of the above information should lead to some obvious steps for reducing or eliminating bandwidth bottlenecks when it comes to texture optimization on the art side. Make sure the textures have mips and are compressed. Don't use heavy 8x or 16x anisotropic filtering if 2x is enough, or even trilinear or bilinear if possible. Reduce texture resolution, particularly if the top-level mip is often displayed. Don't use material features that cause texture accesses unless the feature is really needed. And make sure all the data being fetched is actually used - don't sample four RGBA textures when you actually only need the data in the red channels of each; merge those four channels into a single texture and you've removed 75% of the bandwidth usage. While textures are the primary users of memory bandwidth, they're by no means the only ones. Mesh data (vertex and index buffers) also need to be loaded from memory. You'll also notice in first GPU pipeline diagram that the final render target output is a write to memory. All these transfers usually share the same memory bandwidth. In normal rendering these costs typically aren't noticeable as the amount of data is relatively small compared to the texture data, but this isn't always the case. Compared to regular draw calls, shadow passes behave quite differently and are much more likely to be bandwidth bound. A frame from GTA V with shadow maps, courtesy of Adrian Courr?ges' great frame analysis This is because shadow maps are simply depth buffer that represent the distance from the light to the closest mesh, so most of the work that needs to be done for shadow rendering consists of transferring data to and from memory: fetch the vertex/index buffers, do some simple calculations to determine position, and then write the depth of the mesh to the shadow map. Most of the time, a pixel shader isn't even executed because all the necessary depth information comes from just the vertex data. This leaves very little work to hide the overhead of all the memory transfers, and the likely bottleneck is that the shader just ends up waiting for memory transfers to complete. As a result, shadow passes are particularly sensitive to both vertex/triangle counts and shadow map resolution, as they directly affect the amount of bandwidth that is needed. The last thing worth mentioning with regards to memory bandwidth is a special case - the Xbox. Both the Xbox 360 and Xbox One have a particular piece of memory embedded close to the GPU, called EDRAM on 360 and ESRAM on XB1. It's a relatively small amount of memory (10MB on 360 and 32MB on XB1), but big enough to store a few render targets and maybe some frequently-used textures, and with a much higher bandwidth than regular system memory (aka DRAM). Just as important as the speed is the fact that this bandwidth uses a dedicated path, so doesn't have to be shared with DRAM transfers. It adds complexity to the engine, but when used efficiently it can give some extra headroom in bandwidth-limited situations. As an artist you generally won't have control over what goes into EDRAM/ESRAM, but it's worth knowing of its existence when it comes to profiling. The 3D programming team can give you more details on its use in your particular engine. And there's more... As you've probably gathered by now, GPUs are complex pieces of hardware. When fed properly, they are capable of processing an enormous amount of data and performing billions of calculations every second. On the other hand, bad data and poor usage can slow them down to a crawl, having a devastating effect on the game's framerate. There are many more things that could be discussed or expanded upon, but what's above is a good place to start for any technically-minded artist. Having an understanding of how the GPU works can help you produce art that not only looks great but also performs well... and better performance can let you improve your art even more, making the game look better too. There's a lot to take in here, but remember that your 3D programming team is always happy to sit down with you and discuss anything that needs more explanation - as am I in the comments section below! Further Technical Reading Render Hell - Simon Tr?mpler Texture filtering: mipmaps - Shawn Hargreaves Graphics Gems for Games - Findings from Avalanche Studios - Emil Persson Triangulation - Emil Persson How bad are small triangles on GPU and why? - Christophe Riccio Game Art Tricks - Simon Tr?mpler Optimizing the rendering of a particle system - Christer Ericson Practical Texture Atlases - Ivan-Assen Ivanov How GPUs Work - David Luebke & Greg Humphreys Casual Introduction to Low-Level Graphics Programming - Stephanie Hurlburt Counting Quads - Stephen Hill Overdraw in Overdrive - Stephen Hill Life of a triangle - NVIDIA's logical pipeline - NVIDIA From Shader Code to a Teraflop: How Shader Cores Work - Kayvon Fatahalian A Trip Through the Graphics Pipeline (2011) - Fabian Giesen Note: This article was originally published on fragmentbuffer.com, and is republished here with kind permission from the author Keith O'Conor. You can read more of Keith's writing on Twitter (@keithoconor).
  7. Iñigo Quilez presented techniques for raytracing and distance fields in 4096 bytes on the GPU at NVSCENE 2008. Click the link below to view the presentation PDF. PDF: click here.
  8. This tutorial video shows you how to use Qualcomm Snapdragon Profiler to detect bottlenecks and optimize your application’s performance and power efficiency. In addition, it provides an overview of its three mode: Realtime, Trace Capture and Snapshot Capture.
  9. Didn't do much today mainly messed with the player hub and Optimized the game code to where I can hold so far 60 AIs at once! I will be working on the Ai's code more tomorrow and will show a video seeing how much AI's I can fit in one battlefield! Also for those of yall who are interested in keeping up-to-date more consistently I have a new twitter account follow me here: https://twitter.com/ACTNS_Ent
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!