[web] A new job, and many questions

Started by
7 comments, last by demonkoryu 18 years, 1 month ago
Hi, I have my first programming job, developing PHP applications in a Travel Agency. I'm so happy! "Professional" just sounds cool (even if its only web dev). [grin] Well but I have a few problems. We use a 3rd-party Internet Booking Engine (IBE) to sell our travels, which is currently integrated into the site via iframe. We want to add content to the html of the IBE. To do so, I have to replace the iframe and embed the IBE HTML content directly into our site. Currently, I'm doing this: -Retrieving the IBE start page -Inlining the &#106avascript -Replacing any relative URLS with absolute ones which calls our site with additional parameters giving the IBE's current page URL and form data So, do you have any advice on how to do this (integrating 3rd party content, spidering, scraping) cleanly? Are there any libraries/tutorials/sites/blogs/IDEs/... you would recommend to a PHP beginner?
Advertisement
I would be interested to know the answer to this, I would like to include some of my sites contents on other peoples websites by including text, possibly through &#106avascript.<br><br>As your site uses PHP, you might be able to open a connection to the 3rd party site and retrieve the page, then if its in XML process it and print it out as desired. If its pure HTML you can still print it out but processing and restructuring the data will be a lot more limited.
Gavin Coates
[size="1"]IT Engineer / Web Developer / Aviation Consultant
[size="1"][ Taxiway Alpha ] [ Personal Home Page ]
you could use curl or even just file(); in php to get all of the data off of the page. Depending on how their content is layed out the only challangeing part would be parsing the data.

<?php    $contents = file('http://www.gamedev.net');    // Do something with contents    foreach($contents as $line)    {         echo "$line";  // I believe each line already retains it's \n character    }?>
Well, it's HTML, and parsing is a REAL mess, because they use &#106avascript to generate HTML with embedded &#106avascript which generates HTML and &#106avascript. [imwithstupid]<br>So the problem is <br>a) dealing with links relative to their site<br>b) forwarding their form (POST) data<br>c) adjusting their &#106avascript.<br><br>In addition, I'm asking for helpful things related to PHP, mySQL, &#106avascript.<br>If you could tell me helpful libraries, sites, ebooks, editors, IDEs, ... I'd be forever grateful. [smile]
If you need to retrieve information from a page that needs POST data then CURL is essential. Are you just trying to get the links out? If so you should be able to easily search for and grab the content in between.
If you need to retrieve information from a page that needs POST data then CURL is essential. Are you just trying to get the links out? If so you should be able to easily search for <a and </a> and grab the content in between.
Your best bet is to get them to produce a machine-readable protocol. Using screen-scraping in a production application is an unbelievably bad idea, as the other party could change the format of their HTML at any time without notifying you, and your system would then break.

You should buy their server-server integration product instead.

If they are a payment service provider, you may find that you need to satisfy due diligence, and get a SSL certificate in order to do this. Live with it - it's the only way of doing things.

Moreover, screen-scraping is probably against their terms and conditions, so they would be well within their rights to terminate your account.

It's in their interests to make their kit easy to integrate with. I'm sure they have a machine-readable protocol.

Mark
I have no other options, so I'm thinking of either
a) caching and inlining the IBE HTML with PHP or
b) using &#106avascript to scrape external content on the client side.

I have another question. I'm looking for a "free web service" directory, as I need weather data. With google, I came across uddi.org, but I still can't find a single listing with cost-free web services. The only ones I know are the google, ebay and amazon APIs.
Do you know any such "web service directories"?
tstrimp, thanks for your input, but the problem is not retrieving the page. It's wading through all the JS...

This topic is closed to new replies.

Advertisement