We prototyped a Hearthstone clone in Unity for work (before we knew they used Unity). For the most part it worked out really well. Our cards were 3D objects so we could do tricks with them. But we had issues with card text because we were trying to use Unity's built in text rendering. You would certainly have to go the Blizzard route and figure out a custom system for card text.
For cards, we had a pool of 40 or so 'blank' cards that were created when the level was loaded. We had a image cache for all the card images that we just let images get loaded from disk as needed and then passed around a shared texture. Text was Unity's 3D text, it was not pretty at all (this was our biggest issue for the demo). When a card was drawn from your pile or played by the opponent we pulled a blank card from the pool and constructed it. Once a card was removed from play it went back to the pool to be used again.
Sound manager for all our games at work is a singleton. Kind of lazy on our part but that is something there is only ever going to be one of. But you can make the point that only your game 'manager' (quoted since manager seems to be a bit taboo around here) needs to know about the sound system since it would know when cards are played or when effects happen. There is no reason for your game manager to be a singleton if you properly set up your state management.