Networking, 3rd party code is involved? Those can cause random shit too.
Anyway, when I run into bugs like this, I usually apply a fallback-like "fix" at the smallest scope I can locate where the bug occurred. I test if this ""fix"" doesn't cause problems, and I am done with it for that time. I cannot put enough quotation marks around the word "fix".
It also helps if others test or use the software:
The latest of this random crap (a measuring amplifier fails to work in buffered data acquisition mode, even after) I ran into was eventually coming up in a rather deterministic way (always) for a specific use case and it turned out that my fix I applied months earlier worked pretty good, and I could improve the fix (still no buffered DAQ but at least the program properly fails back to software timed sampling). The root cause is still not clear, our best guest is that our special/custom/motherfucker corporate firewall is blocking a port for the program only (or rather it's opening the port only for a specific other program, which works fine with buffered DAQ). The device can connect and read samples, but buffered DAQ required another port to be open.
Dunno, maybe there's a proper way to test/debug these problems, but I'm not a real programmer, so
Story time over.