First thing is to boil it down a bit. Reference counting is out of the scope of this - it's dealing with what happens when you do decide to delete something that is the issue at hand.
Consider a simplistic rendering loop that renders, presents, and then waits for the device to be idle each frame. After the device is idle, but before anything is rendered, you can be sure that destroying a resource will be safe, as long as you don't try to use it again. So you can either defer all logic that would ever delete a resource until that time, or allow a deletion to be requested at any time but put it in a buffer to be "played back" in the safe window.
Fairly trivially, this can be extended to a pipeline that is multiple frames deep. Let's go with 3, for the sake of example. This means you might have rendering commands on the fly up to 2 frames "behind", and therefore you have to take that into account whenever you want to delete a resource. The easiest thing to do in this case IMO would be to have a growable array for each frame (e.g. 3); they keep track of deletion requests, each corresponding to requests issued when the CPU is processing that frame. Each frame would also have a fence associated with it. Whenever a new frame is started, it waits on its fence, and then before rendering anything, it deletes the appropriate resources and clears the vector for new requests to be added. When submitting the commands for that frame, you say that the fence should be signaled when the submission is complete. The fence ensures that the CPU will never get too far ahead of the GPU and cause the renderer to trip over itself, ensuring that you won't ever try to delete a resource that's being used by the GPU.
This concept of shifting by the number of frames in your pipeline applies to destructive updates, such as rendering to a framebuffer, as well. For N frames, you will need N copies of each attachment, and cycle through to determine which copy you are writing to.