Automatic Patching Utility

Started by
11 comments, last by kevmo 20 years, 4 months ago
I don't see how a diff-like algorithm would be bad on any kind of data - it is simply storing additions,modifications, and deletions. It doesn't make any sort of specification on what kind of data you are going to be delta compressing (i think that is the correct term: storing only data that has changed).

EDIT:

On the rsync algorithm (http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html):
"It's an algorithm meant for: when you didn't have the forethought to, at one end or the other, have a copy of the old file so you could run a diff".

This isn't the case in a patching utility, as the patch is generated from the old file and new file, and then sent to the client, who then just applies the patch.

According to http://samba.anu.edu.au/rsync/tech_report/, one advantage that rsync gives over running diff on the files is that it uses less CPU time. However, you have to consider the fact that you would only need to run this diff once to make the patch, after that you just distribute the patch. It is less data to store just the diff information in a smart manner rather than all the checksums and modified data.

In short, I disagree that rsync is a viable method for a standalone patch.

[edited by - kevmo on December 17, 2003 10:58:46 PM]
Advertisement
I haven''t done this but I''m interested in this topic as well. I think what Anon user suggested seems like a viable solution.

You find the longest string of matching data. Save the offset and length of it in both programs. Then go through the data before and after the longest matching data only changing the data that is different and adding the data if it is non-existant.
The patcher program would be a list of patchable data. e.g.

class Patch
{
long lOffset;
virtual PatchIt();
....
};

class PatchByte : public Patch
{
.....
unsigned char ucPatchValue;
PatchIt(){.....}

};

class PatchArray : public Patch
{
unsigned char* pData;
long length;
....etc.
}


list patches;

...for all patches
patch it....

I might as well write a patch utility one of these days

Seems like a good challenge...

-Viktor
As a follow-up, I'll briefly described the final design that I think I am going to take.

The situation is this: the user has file A, and the producer of the game wishes to update this file to file B.

To create the patch:
- Use a graph-based approach specified by http://www.justinalia.com/cs/algorithms/diff02.html
- Consider the size of file A in bytes to be n, and the size of file B to be m.
- Create a n X m grid of vertices, with a vertex labeled (x,y), x is in the set of [0,n] and y is in the set of [0,m].
- Add an edge from each vertex to the vertex on its left and on its right, and above it and below it.
- Add a diagonal edge from (x-1,y-1) to (x,y) if the x-th byte in A is equal to the y-th byte in B.
- Perform a weighted shortest path search from (0,0) to (m,n), with the horizontal and vertical edges weighted 1 and the diagonal edges weighted 0.
- What this search represents is the following: A horizontal edge is an addition to the file and a vertical edge is a deletion from the file (think about this for a bit). A change is simply a deletion then addition or vice versa. A diagonal edge means the data isnt stored.
- This means if we store each horizontal and verticle edge in the path along with the coordinates, we know exactly where to change the bytes. Note you also have to store the byte from the new file with a horizontal edge.
- This storage can be improved by storing sequences of edges together, thereby only storing the start of the sequence, and the number of additions/deletions in the sequence, and if the sequence is the addition, the bytes to add.

To apply the patch:
I haven't thought much about it, but it should be trivial given the byte-level diff patch.

For an automatic update service:
Keep a sort of revision control system on the server, storing hashes of all previous versions of the file, and the diff-patch to the next version of the file. To update a given file, look up its hash in the database (if it is not there, it means the file got corrupted, might as well either a) send the whole file over or b) do a rsync-like operation on the bad file). If the hash is there, send all the patches up to the latest version to the client, and apply them sequentially.

Comments/suggestions welcome.

[edited by - kevmo on December 17, 2003 11:41:18 PM]

This topic is closed to new replies.

Advertisement