Best practices for storing binary data?

Started by
7 comments, last by Rattenhirn 7 years ago

Disclaimer: I'm talking about a very small scale operation; think less than five people.

I'm currently in the process of migrating my code over to source control (using Amazon's AWS/CodeCommit stuff with Git currently, but I'm not dedicated to it). I'm not sure what to do about my binary dependencies though. At the last place I worked, we stored assets in a separate home-grown version control system and source code/external libraries in a single SVN repository, but we also had our own separate symbol servers and a few hundred grand to throw at hardware.

I'm fairly sure it's most sound to store external DLLs in the main repo, but what about PDBs? One of my external libraries has a PDB that weighs in at a gig-and-a-half. That seems a bit insane for what should be a lightweight code repository. Regarding assets, is it too much to commit them given that I'm working on an indie scope, or should I look into a separate repo?

Advertisement

One of my external libraries has a PDB that weighs in at a gig-and-a-half.


I definitely wouldn't commit that, especially if it is EVER modified. What in the world makes a PDB that large?

git has LFS now, although I have not tried it out seriously at work yet, so I can't vouch for how well it works. Possibly once my current project enters pre-production and we get more than a meg of art.

I definitely wouldn't commit that, especially if it is EVER modified. What in the world makes a PDB that large?

CEF. The DLL itself clocks in at ~100MB as well. I tried building it by hand instead of using the prebuilt binaries in the hopes of eventually trimming it down, but my initial attempt produced a DLL that managed to actually crash the VS debugger, so i noped out of that pretty fast.

git has LFS now, although I have not tried it out seriously at work yet, so I can't vouch for how well it works. Possibly once my current project enters pre-production and we get more than a meg of art.

That's sort of the direction I was leaning towards for assets. I have very little actual assets right now so I could easily put it in source control for the moment, so I was more concerned for the long term. Definitely the main crux of this topic though was the PDBs.

A 100MB DLL w/ 1.5GB PDB is definitely out of thr ordinary so general advice wouldn't apply.

In my experience, the typical thing to do is keep all your binary dependencies in Git alongside the code, until you run into a situation like this, at which point you move your "external" directory into a SVN repository that the team checks out alongside the main Git repo... :(

Thanks for the advice. For the moment I've got my PDBs and external library dependencies store locally (since I'm a one-man team) and keep my assets and code in my git repository, and just moved everything into place as part of a build step.

I can always shift things around later once I start scaling up, so rather than go for an optimal zero-to-ready system like the typical company would have, I opted to make something that only puts stuff I actually cant afford to lose in source control.

Another option is to just give in and use Perforce :lol:

There's something to be said for just throwing files that large in S3, and including a script in your repository that just runs an aws s3 sync command to keep the local copies up to date...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Move away from Git and move to a centralized versioning system like Perforce or Subversion that doesn't have issues with large files?

Also, DLL/EXE PDBs are "special" in that it is probably better to use a symbol server with source indexing instead.

Perforce is free for up to 20 users and handles binary data very well.

This topic is closed to new replies.

Advertisement