Jump to content
  • Advertisement
Sign in to follow this  
coderWalker

Getting the address of a redirected page.

This topic is 2465 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The goal of my program is to grab a webpage and then generate a list of Absolute links with the pages it links to.

The problem I am having is when a page redirects to another page without the program knowing, it makes all the relative links wrong.

For example:

I give my program this link: http://www.brunswickcc.edu/LinkClick.aspx?link=340&tabid=270
On this page, if it finds the link href="../" meaning directory up, it errors because there is no directory above the root.

However when this error is invalid, because the pages real location is: http://www.brunswickcc.edu/FacultyStaff/AcademicAdvising/AdvisementChecklists.aspx.
Meaning that "../" is linking to http://www.brunswickcc.edu/FacultyStaff/AcademicAdvising/ which is a valid page.

So my question is how is my program supposed to know that the page it received is at another location?

I am doing this in C Sharp using the follownig code to get the HTML


WebRequest wrq = WebRequest.Create(address);
WebResponse wrs = wrq.GetResponse();
StreamReader strdr = new StreamReader(wrs.GetResponseStream());
string html = strdr.ReadToEnd();
strdr.Close();
wrs.Close();


Any help is appreciated Thanks in advance,

CoderWalker

Share this post


Link to post
Share on other sites
Advertisement
The tool wget can do this for you (in its mirror or archive mode.)
You can either use that tool, or you can read the source (it's open-source) to see how it does it.
Note: it's C, not C#.

Share this post


Link to post
Share on other sites
Hidden
None of your links generated should be absolute links(unless they are external links of course), which is what is causing the problems that you are getting.

When generating links to anywhere inside of your website, use the Html.ActionLink(string action, string controller) function to generate links and it will always guarantee the correct link is used.

For example, in your view you would use in the mvc 3 framework


<head>
<body>
This generates a link !!
@Html.ActionLink("ControllerName", "ActionName")
</body>
</head>


but if using mvc2 change to

<%= Html.ActionLink("ControllerName", "ActionName") %>

Then, you should have a controller named ControllerName

And a function inside called ActionName

So when the page is generated, the links will always be good links . . .

Share this post


Link to post
Does anyone else know how I could do this?

Do I need to view the HTTP Response?
(I know it's possible because Firefox changes the address in the address bar)

I would like to achieve this without using an external library.

Share this post


Link to post
Share on other sites


Do I need to view the HTTP Response?


Yes.

Unfortunately, as with all things web, things get convoluted fast.

3xx response codes are one way, the wiki page lists many other methods.

Share this post


Link to post
Share on other sites
The "correct" way to do this from the server is to return a 301 or 302 redirect. This will contain the new address in the Location: header.
However, there are other ways of redirecting a user, include "meta refresh," actual "Refresh:" headers, and various JavaScript. You may not need to worry about those, though, depending on your application -- if you mirror a site, you mechanically mirror the client-side content that will redirect, not the redirect target.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!