Patches

Started by
14 comments, last by s3202 21 years, 10 months ago
I've been working on some "patching code" myself. Here's what I do:

Open the OLD file and the NEW file in binary mode.
Read the OLD file filling a dynamically created buffer to hold it.
Compare the size of OLD file and NEW file
You need to know this because if the NEW file is shorter or longer this will have an impact on the "patched" file (especially if it is an .exe)
Open a new DATA file in binary mode
Read in a buffer from the NEW file
Compare byte for byte NEW to OLD

ok here's where we need some code... After some trial and error (doesn't everyone code like that) I created a struct to hold my offset and the bytes that were different, here's the struct:


    static int BLOCKSIZE = 18;typedef struct tagPatchBlock{    DWORD offset;    // how far into the file to write new data.    WORD  count;     // the number of values that are to be written @ offset.    BYTE  values[BLOCKSIZE];    // the values to write.} sPatchBlock    



ok now:

When a difference is detected store the offset in the sPatchBlock struct
If you fill up the values (18 in this case) you save that block to your DATA file like so:
fwrite(&fileblock,sizeof(sPatchBlock),1,PatchFile);
Also if the delta of the next offset is more than one you will save off that fileblock and create a new one.


At first glance it may look like your wasting space in the DATA file... and you would be correct. The solution is to compress the DATA file using zlib or another compression alogo.

The only thing left is a header in the DATA file and creating a new file is easy...

If you have any questions feel free to email me - I'll gladly send you some of my test code.








Dave "Dak Lozar" Loeser †

[edited the fixed text]

[edited by - Dak Lozar on June 2, 2002 11:38:37 AM]
Dave Dak Lozar Loeser
"Software Engineering is a race between the programmers, trying to make bigger and better fool-proof software, and the universe trying to make bigger fools. So far the Universe in winning."--anonymous
Advertisement
Dark Lozar,

The problem with your code is that doesn't work all that well for something like the following:

offset: 0000000000111111111122222222223        0123456789012345678901234567890file 1: {ABCDEFGHIJKLMNOPQRSTUVWXYZ$$$}file 2: {ABCDEFGHI**JKLMNOPRSTUVWXYZ} 
There's a better way to do it than what you propose. You've probably noticed that the two strings above have three substrings and the start and end of file markers (represented here as "{" and "}") in common:

{ ABCDEFGHI JKLMNOP RSTUVWXYZ } 
The offsets for each of these items in each of the two original files are:

file 1: 00, 01 - 09, 10 - 16, 18 - 26, 30file 2: 00, 01 - 09, 12 - 18, 19 - 27, 28 
You'll notice some gaps between the offsets given:

file 1 gaps: 17, 27 - 29file 2 gaps: 10 - 11 
File 1 gaps indicate those characters belonging to file 1 which have been removed from it. File 2 gaps indicate those charcters belonging to file 2 which have been added to file 1.

You use this information to define ranges of offsets where bytes should be removed from the original file (you use the contents of this file as input) and offset/data pairs where bytes should be added to the final file.

[edited by - chronos on June 2, 2002 2:27:49 AM]
hi chronos,

In all my testing (that''s all I''ve done with this code so far), I''ve successfully recreated all data files and executables using this technique.

using your data and assuming that file 1 is OLD file and file2 is NEW file, here''s what I would end up with.


offset: 0000000000111111111122222222223
0123456789012345678901234567890
file 1: {ABCDEFGHIJKLMNOPQRSTUVWXYZ$$$}
file 2: {ABCDEFGHI**JKLMNOPRSTUVWXYZ}

OFFSET: 10
COUNT : 18
VALUES: **JKLMNOPRSTUVWXYZ}


One of the items that I did leave out was the in the header of the data file there is a checksum of the file being patched (OLD) and of course the data file that it is being patched from along with the file size of the resulting NEW file and checksum.

I''m not sure where you''re thinking this technique is lacking - care to elaborate?









Dave "Dak Lozar" Loeser †
Dave Dak Lozar Loeser
"Software Engineering is a race between the programmers, trying to make bigger and better fool-proof software, and the universe trying to make bigger fools. So far the Universe in winning."--anonymous
Dark Lozar,

What I was trying to suggest is that while your code works okay for files where the changes between the two files share the same offsets in each file...
file 1: {ABBA...}file 2: {ACCA...}          ^^  
... it doesn't work very well when the changes cause the offsets to change ...
file 1: {ABABAB...AB}file 2: {IABABAB...AB}         ^^^^^^^^^^^^  
You'll notice in this second example that the only difference is an "I" inserted at the start of the second file. Your algorithm would produce as many chunks as (filesize-1) / 18 + 1, while the algorithm I gave would simply do this:
file 1: {ABABAB...AB}file 2: {IABABAB...AB}  
Both have in common:
{ ABABAB...AB }  
If N is the offset of EOF for file 1 and K is the offset of EOF for file 2:
file 1 offsets: 0, 1 - (N-1), Nfile 2 offsets: 0, 2 - (K-1), K  
We get the following gaps:
file 1 gaps: nonefile 2 gaps: 1  
Instead of so many chunks what you get is:
Remove nothing from file 1.Add "I" at offset 1 to file 2.  
Because offsets will change whenever you insert or remove something from a file (except at the end of it), you'll want to use an algorithm that handles these cases well.

[edited by - chronos on June 2, 2002 4:59:45 PM]
There's something I should clarify about the algorithm I explained above. When looking for characters in common between the two files we must preserve the order in which these characters appear. In other words, we're looking for the "longest common subsequence", as I suggested before.

This means that for a pair of files like these,
{AABBBBAA}{BBAAAABB}   
you don't want to pick common substrings out of order:
{ AA BB BB AA } <-- WRONG         
What you want to do is look for the longest common subsequence. In this particular case we have several such sequences:
{ AA AA }{ AA BB }{ BB AA }{ BB BB }   
Any one of these will work, although some will work better than others (the two in the middle will yield a smaller set of changes in this case).

[edited by - chronos on June 4, 2002 4:36:35 PM]
chronos,
Yup, I see what your saying... and do agree that this is a better technique...

I''ve been relying on compression as a method of keeping the data file (for the patch) small... and so far things are working out.

I am intriqued by the method that you have described and will probably spend some time to see if there are benefits in reducing the size the resulting data file...

Take care,

Dave "Dak Lozar" Loeser †
Dave Dak Lozar Loeser
"Software Engineering is a race between the programmers, trying to make bigger and better fool-proof software, and the universe trying to make bigger fools. So far the Universe in winning."--anonymous

This topic is closed to new replies.

Advertisement