Sign in to follow this  
Wolfdog

Cisco router problem

Recommended Posts

I had an issue today with a Cisco router that maybe someone more knowledgeable than me might know why. This just happened out of no where today with no configuration changes. We run a website and a still small but growing virtual world behind a Cisco router. If it matters we use NAT. The problem appeared from the client side as an inability to download larger files ( >~256k) from our website. The size that wouldn't work was variable sometimes you could get a 1mb file through other times it would fail on 64k. Anything smaller was just fine. In a browser it just hung after some amount of transfer. The same files had a tendency to hang at about the same spot but different files hung at different times in different browsers. We noticed that through https it worked fine. So we tested using NAT to translate an external port of 81(and tried another random port) to the port 80 internally. That made no difference. Between machines internally (behind the router) it worked fine. I regret not testing a translation to a different port on the web server as that might have given a few more clues to what caused it. The router itself was not overloaded at all. Or at least from what I could tell. Memory, CPU and interface bandwidth were all fine. The solution was a router "reload". Has anyone else seen an issue like this? Or have any ideas why this might happen? I'm trying to look into preventing it in the future.

Share this post


Link to post
Share on other sites
I've seen this exact same behavior happen at the PC end, but no particular router was involved.

The fault in my case was the "checksum offload" feature of all of my Gigabit ethernet cards (I had 3/4 different cards that would behave this way). They would receive a packet and in about 0.05% of the cases would calculate an mismatching checksum. Then it would throw the packet away. The sender's TCP stack would resend the packet, but the checksum would still be the same once I received it, and my NIC would keep discarding it no matter how many times it was resent. When I disabled checksum offloading, the CPU would always compute a matching checksum correctly.

Since it sounds like it's your router causing it, it could very well be something completely different. Perhaps it has some similar kind of checksumming feature. Perhaps your virtual world servers are sending bad checksums (if the bug I encountered is bidirectional, it could h

Share this post


Link to post
Share on other sites
You say a cisco router, but have you actually downloaded files using another internet gateway/router? Reminds me of a PMTU black hole issue I saw, but thats just one of a few ideas. Do you have access to the cisco router, in which case what is it? When you say download web files, what exactly do you mean? What web server are you using? Working for https is a big clue but I've been out of the game a while and cant think.

Share this post


Link to post
Share on other sites
It's a Cisco 2801 running IOS 12.4(23). We bought it used because it was at a great price. My hope is that the unit itself is not faulty.

It happened again yesterday after a week of running just fine, a reload fixed it. Again there were no configuration changes on either the apache web server and or router itself. The file I used for testing is a flash "program" used to create our avatars. The issue does happen to any file over about 100k. Through HTTPS worked again as before. HTTP through another port did not.

I do have a tcpdump and wireshark from both sides of the transfer. You can download it from <removed>. Using my limited knowledge of how TCP works. I believe that packet 285 on the client lines up with packet 272 on the server. That's when the client side goes quiet for the next 5 seconds. This is where it gets very odd to me. It looks like the router is responding to the server with its own ACK packets with a TTL of 255 when it arrives (no hops). It continues this until packet 373 on the server which is a retransmit packet. That one seems to get through but now the two are completely out of sync with each other because the client did not get any of the data that the server thinks has already been ACKed.

This lead me to think that the router was attempting to buffer the data for the client with the intention of sending it all back down the other side. So far after hours of digging through Cisco documentation I can not seem to find anything that would cause this. I ended up disabling basically everything on the router except the NATP. All traffic shaping has always been disabled although I do plan to use it in the future.

The client side is completely ruled out because we had customers complaining about the inability to use the flash program. Any help debugging this would be greatly appreciated.

[Edited by - Wolfdog on August 12, 2009 9:04:48 PM]

Share this post


Link to post
Share on other sites
It may very well be the NATP that actually is faulty. Perhaps a software update would resolve the issue?
It seems, from your description, as if a dropped packet to a client may cause the router to enter that buffering mode, but not actually send the data properly after that?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this