Sign in to follow this  
Yalpe

Improve application "cold" load time

Recommended Posts

Hey, I'm an intern in a corp and I have to figure out how to optimize the boot time of their image generator. I've already reduced it by 75% on a warm boot (ie.: boot the application, close it when its initialized then fire it up again). It seems that windows does a lot of caching in the background because it now takes 10 seconds to fire up under warm boot and still over 30s under cold boot. The real problem comes from how the projet is layed out. There is over 190 projets that compile into dlls. We have to keep them that way because the application is distributed (runs on many cpus). So is there a way to improve DLL load time for cold boots?

Share this post


Link to post
Share on other sites
I doubt it. As you said, Windows will keep DLLs around in memory after they've been released by other modules, in case they get re-used. There's a registry value called AlwaysUnloadDll or something that you can set to force DLLs to always unload.
The only way I can think of would be to get Windows to touch the DLLs, by writing a program that just loads the DLLs, frees them, then exits.

Share this post


Link to post
Share on other sites
Of course theres always a load screen, it doesn't speed up load times, but is better than staring at nothing. Just a simple startup window with a progress bar that progresses a step per dll or whatnot. But if you have a lot of DLLs to load, its going to take some time. The only other option, as evil mentioned, is having a second 'quicklaunch' app that loads the DLLs and then unloads them on windows startup. I know OpenOffice has a quicklaunche (and I ~think~ AOL had one too).

Share this post


Link to post
Share on other sites
Thanks for the fast replies,

I don't know if it would work to have a loader. I'm not really good at distributed computing yet. Locally it would probably help but on the real simulators I doubt it. Btw, I tried to disable the registry key but there was still a great impact on performance on warm boot. Weird stuff. A load screen isn't really desireable because we only the devs uses the UI's since it is an image generator. Didn't adobe get a loader too in reader 7.0?.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Just having the modules in DLLs surely won't help anything even if used on multiple CPUs... or is there something you forgot to say?

Anyways, DLL loading takes a lot of time - when I read the headline I thought I'd step in and recommend minimizing the dependency of DLLs...

If it's at all possible (it should be unless you forgot to tell us something), link statically for faster start up speed.

Share this post


Link to post
Share on other sites
The only other way to do it is dynamically load functions as you need them. Typically I wrap DLL calls inside of a class. I have the dll function pointers as private members, the public member functions that basically checks to see if the function is loaded, and if it isn't -- then load it, and finally call the function. That way If I had 10,000 functions, it'l only load them as I use them. Just an idea.

Share this post


Link to post
Share on other sites
Most of the dlls are loaded via LoadLibrary. I'm not sure if I'm right (as I said I'm no pro in distributed computing) but I assumed you would need DLLs to spread the application on a cluster of machines. Its not only multiple CPUs into 1 PC.

Share this post


Link to post
Share on other sites
I think the impression we are getting is that the DLLs are simply stored on other PCs, and loaded into this single app.. which is defintaly not distributed processing.. Then again, I know close to nothing about one app being ran on multiple PCs (with single multy-cpu machines, its done via threading iirc).

Share this post


Link to post
Share on other sites
I'm trying it right now. It seems to help a bit but I'm still trying to determine by how much. I'll come back later with numbers.

Share this post


Link to post
Share on other sites
Quote:
Original post by Dreq
The only other way to do it is dynamically load functions as you need them. Typically I wrap DLL calls inside of a class. I have the dll function pointers as private members, the public member functions that basically checks to see if the function is loaded, and if it isn't -- then load it, and finally call the function. That way If I had 10,000 functions, it'l only load them as I use them. Just an idea.
Waste of time. Windows already delays loading functions from a DLL until first access. You might gain something by delaying loading the DLL itself, but delaying functon loads gains you nothing.

Share this post


Link to post
Share on other sites
Ok,

I don't have time to rebuild the whole solution right now. I'm pretty sure my results aren't accurate. I proceeded like this.

1- Rebuild solution completly
2- Benchmark cold load time (98s) // Its not impossible that I might have done a mistake here, that would give 138s
3- Rebase
4- Reboot
5- Benchmark cold load time (134s) // <-- what the...
6- Benchmark warm load time (24s) // Uh... where did the 80 others go?

I used "rebase -b 0x60000000 *.dll on my bin directory (no subfolders). I used Dependancy walker to verify that rebase worked. It did. These times are for a debug build. If anyone has an idea of what's taking so long on cold boot... be my guest.

Is it possible that there is caching on handles as well? I know there is a whole damn lot of .ini files reading and databases (not standard DBs mind you, texture DBs).

My computer is a single core P4 2.5GHz with 1gig of ram and an R9700 video card. The application takes up to 1.700 gig of memory (including page file). So maybe paging is killing me I don't know. If windows leave a lot of stuff there, that would explain the huge margin.

Thanks for any insight you could give me.

[Edited by - Yalpe on June 12, 2006 2:26:58 PM]

Share this post


Link to post
Share on other sites
You are setting the base address of each DLL seperately, correct? The point in rebasing is to avioid the OS having to relocate each DLL as it's loaded.
What is the combined size of the exe and all the DLLs?

Share this post


Link to post
Share on other sites
Before you make something faster, first find out what is slow.

So, how do you figure out what part of your code is slow?

Method 1: Guess

Guess. Sure, it is often wrong, and you'll end up working on code that isn't a bottleneck. But it is a common method. :)

Method 2: Profile

I hate profiling programs.

Method 3: The Monte Carlo method:
1> Do the thing that is slow with a debugger attached.
2> Randomly hit "create breakpoint".
3> Examine the state of the program. Look at the call stack.
4> Start program up again.
5> Repeat steps 2 through 4 5 to 20 times.
6> You should now know what chunk of code is taking up 80%+ of your time.

This won't tell you how to make it faster. It will, however, let you know if it is loading DLLs or loading INI or loading DB files that is taking all of the time.

Usually, when something is slow, it is "one thing" that is far slower than the rest, especially in as-yet-unoptimized code. (I believe this is true for the same reason that numbers tend to start with a 1.)

Share this post


Link to post
Share on other sites
Quote:
Original post by Yalpe
[...]there is a whole damn lot of .ini files reading[...]
If you're using windows API functions to read the INI, this could be a significant source of load time. In a project I was working on that basically read ini files and then did file work based on the ini files, I got around a 20% speedup (~5 seconds of ~25) by simply making my own INI file handling class that loads and parses the ini file a single time, operates on it, then saves the changes a single time. This was with a single ini file that had maybe 11 sections with 10 entries each for around 110 lines in the ini file (not counting whitespace), with the keys being loaded in the order the appear in the file. My loader was VERY simple and parsed whole lines using the C++ standard library and stored the whole thing in a map<string, map<string, string> > with the first string being the section, the next string being the key, and the final string being the value. The only downside to my loader was that it didn't preserve comments or key/section order, but that wasn't needed for my project.

I got another similar speedup by making all the file operations async and then just waiting until all are done when that is needed.

Share this post


Link to post
Share on other sites
Quote:
Original post by Yalpe
Ok,

I don't have time to rebuild the whole solution right now. I'm pretty sure my results aren't accurate. I proceeded like this.

1- Rebuild solution completly
2- Benchmark cold load time (98s) // Its not impossible that I might have done a mistake here, that would give 138s
3- Rebase
4- Reboot
5- Benchmark cold load time (134s) // <-- what the...
6- Benchmark warm load time (24s) // Uh... where did the 80 others go?

I used "rebase -b 0x60000000 *.dll on my bin directory (no subfolders). I used Dependancy walker to verify that rebase worked. It did. These times are for a debug build. If anyone has an idea of what's taking so long on cold boot... be my guest.

Is it possible that there is caching on handles as well? I know there is a whole damn lot of .ini files reading and databases (not standard DBs mind you, texture DBs).

My computer is a single core P4 2.5GHz with 1gig of ram and an R9700 video card. The application takes up to 1.700 gig of memory (including page file). So maybe paging is killing me I don't know. If windows leave a lot of stuff there, that would explain the huge margin.

Thanks for any insight you could give me.



Each dll is supposed to have a unique address after rebasing, that's the purpose of rebasing:) YOu should make sure that you've done that. Your example looks like you've made each dll have the same address. You've now gone and made the problem worse:)

Cheers
Chris

Share this post


Link to post
Share on other sites
Another option may be to use a process dumper. After your exe is started and all dlls are loaded, dump the address space of your process to a file using one of the many process dumping tools out there. Then loading the application requires only to load the image back to memory.

Share this post


Link to post
Share on other sites
If you've already built the DLLs, and then start the app, then the DLL files are in the OS file buffer cache, and thus there's less waiting for the hard disk. That would explain the difference between "newly built" vs "newly booted."

Splitting something into 190 DLLs is a really bad idea for performance, though. I'd rather just link everything into a single application, and then use configuration options to activate the parts that you need in each instance of that application.

Share this post


Link to post
Share on other sites
Ok first,

I've rebuilt the binaries overnight. On reboot this morning I got 140sec without rebase. So it might have been a good thing after all.

Total size of dlls and exes combined : 85.9MB.

I'm going to try rebasing with 0x06000000 (96MB) as the base address. That might be more helpfull then rebasing @ 0x60000000 (~1.5gig). I'll let you know how it turns out.

REBASE: Total Size of mapping 0x0000000008600000
REBASE: Range 0x0000000006000000 -0x000000000e600000

According to dependency walker everything is fine. No DLL has the same address.

Now as far as optimizing the code goes, I've gone a long way. At first it took over 40 secs to start under a warm boot. Now it takes 12 at most. I've implemented a cache for ini files mostly. Hard disk access was costly. Profilers don't work too good because my machine is barelly able to handle the application alone. I was said that profilers had an hard time with realtime apps. (There's like 40 threads and 7 processes) I went the old school way. Get CPU clock time before function call, then again after and get the delta. Whatever works!

I don't know why the project is split into so many DLLs honestly. I've only been working here for a month. This is a 4 year old project, so things could have been different back then.

EDIT : 103 secs with proper rebase. I had the wrong address. Paging is probably killing the rest of the seconds I'd think... If I launch the application again I get 27s (10s in release). Cookies (rating) for everyone who helped! Now, is it possible that windows would keep HANDLES as well as DLLs in its cache?

[Edited by - Yalpe on June 13, 2006 6:32:12 AM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this