You could work on a 2D tiled game, I recommend horizontal/vertical instead of isometric. When you switch to 3D Isometric its just the angle of the camera, and setting up 2D isometric diagonals can be a pain for processing at many points.
Since your considering 3D, I wouldn't plan on using Tiles to lock your players position. in 3D you'll typically be more free to move around. I.e. you will have a tile/texture background, and store an exact location on the map, and a heading. (for MMO, you need a heading, so a character will still maintain a basic trajectory on network lag)
A heading is typically something like a momentum vector(x,y) And every update you keep adding the momentum to the location. Server Updates would include a new location, and another heading, even it the heading is (0,0).
You'll need a rendering engine, and want to keep it a bit structured. In pong, you can get a way with a single X/Y variable for the ball and two Y variables for the paddles, and all other actions simply work with those values directly. For a larger scale game, you'll need to manage things in classes/structs, Typically most engines refer to these as GameObjects. Then you'll store the game objects in a list/array, usually per type. (Background, Unit, Effect, foreground and such)
You'll want to start gathering some basic physics, like a leash. a leash matching the idea of a real life leash, where one object can move around freely, but once it reaches the leash length, (by radius) it starts pulling the object on the other side around. This can be used for Camera movement to help keep it smooth. Leashes often include a springy ness, where the distance it moves gets faster, the farther from the leash source it is. Leashes/springs/etc... are good physics effects to know how to use. Look Up Cartesian/Screen Coordinates, which is probably what you used for pong, and learn how to translate those to and from Polar Coordinates. You should also make sure your using and are comfortable with Vectors (2D,3D,4D) and how the can be used for position, or difference/movement. These are things that are highly important in 3D and you can apply in 2D.
You'll need path AI's, like A*, which is a pretty easy one to start with, so you can navigate a map with obstacles. Otherwise your character will need to be directly controlled, and NPC's will have very limited movement ability. I.e. Ever played a game where something was chasing you, but if you ran to the other side of a short wall, the other character just runs into the wall, and keeps running at it, as if the shortest path to you was the only path? A* (and other path finders) are what gets used to allow npc's to walk around an obstacle, even if it means going slightly out of the way.
You'll need to learn to program timed events. In pong, every step, you were constandly checking the position of the ball compared to the paddles and what direction. Fighting engines also need to account for time between strikes. It would be like pong, deciding to hold the ball and then send it across a moment later. At every update, an NPC can attack, I.e. 30 times per second. Adding modifiable variables to characters, like Friction, Speed, Strength, Reaction (typically a measurement of time between strikes, and/or the ability to defend against an attack from a certain speed unit) Understanding about each object potentially having its own values that are modifiable, i.e. a unit gets a spell, how do they update speed.
With pong, you would typically just set a single speed variable, and probably just do a basic direction check for boundaries. RPG's, with potentially thousands of managed objects in play require a different approach.
Another key thing, to go along with Vectors, is the use of Floats instead of Ints. with pong, you probably jsut used integers to set position of objects on the screen, and integers to move them (i.e. no decimal precision) Floats on the other hand for location/movement/abilities can present much smoother motion, espeically when applying more significant physics than a bounce.