In a nutshell, I now use the visibility cluster information in the BSP to cull large quantities of hidden geometry, which has raised the framerate from 18FPS (base1.bsp) to about 90FPS.
I had a look at the lightmap code again. Some of the lightmaps appeared to be off-centre (most clearly visible when there's a small light bracket on a wall casting a sharp inverted V shadow on the wall underneath it, as the tip of the V drifted to one side). On a whim, I decided that if the size of the lightmap was rounded to the nearest 16 diffuse texture pixels, one could assume that the top-left corner was not at (0,0) but offset by 8 pixels to centre the texture. This is probably utter nonsense, but plugging in the offset results in almost completely smooth lightmaps, like the screenshot above.
Before and after - coloured lightmaps.
I quite like Quake 2's colour lightmaps, and I also quite like the chunky look of the software renderer. I've modified the pixel shader for the best of both worlds. I calculate the three components of the final colour individually, taking the brightness value for the colourmap from one of the three channels in the lightmap.
float4 Result = 1;
ColourMapIndex.y = 1 - tex2D(LightMapTextureSampler, vsout.LightMapTextureCoordinate).r;
Result.r = tex2D(ColourMapSampler, ColourMapIndex).r;
ColourMapIndex.y = 1 - tex2D(LightMapTextureSampler, vsout.LightMapTextureCoordinate).g;
Result.g = tex2D(ColourMapSampler, ColourMapIndex).g;
ColourMapIndex.y = 1 - tex2D(LightMapTextureSampler, vsout.LightMapTextureCoordinate).b;
Result.b = tex2D(ColourMapSampler, ColourMapIndex).b;