• Advertisement
Sign in to follow this  

Using JQuery to get the HTML from another website: Possible? Legal?

This topic is 2256 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello

I am trying to get HTML code from a webpage that is not in the same domain. The html text is parsed & summarises a recipe(recipe name, main ingredients, no. of steps) found on that page the HTML code was from. The user can then click the link & go to that webpage outside the domain to view the recipe.

I'm aware of the Same-Origin-Policy, but does that apply to getting HTML code from a webpage outside the domestic domain? I imagine its exactly the same as getting XML, so this is legal & allowed isn't it?

Is there a way I can get the HTML text/code from a domain outside my domestic domain?

Using Javascript & JQuery, the idea is to limit the amount of server requests & storage by having the user perform requests for each recipe & parsing the HTML on the client side. This stops server side bottlenecks & also means I dont have to go through the server & delete old outdated recipe summarisations.

I'm open to Solutions/Suggestions in any programming language or API or etc.

Share this post


Link to post
Share on other sites
Advertisement
If the other site has a JSONP API set up you can. Normal XMLHttpRequests however are not going to work on most browsers. Alternatively, you could forward their content from your own server, so that you are not requesting from a different domain.

Share this post


Link to post
Share on other sites

Generally, you should check robots.txt of the target website. If they allow parsing/bots you are allowed.


Robots merely hints at where well-behaved bots should look.

Unless you have a written statement allowing reproduction rights, you may not republish other content. Content on web sites is automatically copyrighted by the author.

Exceptions may include content that is explicitly licensed, but even there are some problems, given that such licenses typically give content to public domain, which isn't a valid concept in some countries.

Even if an API is provided, it will come with a license. If such license is missing, reproduction rights are subject to interpretation, but are generally assumed to not be given.

Details like this are important in light of SOPA. That bill is being justified precisely by sites whose owners pretend that if something can be copied it can also be distributed. Under SOPA, if even a single person viewed such content on your site, your domain would be seized, any revenue streams closed and you would liable for prosecution under US laws.

So don't pollute the internet with more content farming. Or at least get a real lawyer to explain the gotchas.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement