GET request contains addional non wanted data for XML document

Started by
6 comments, last by Bj 9 years, 7 months ago

I have a project where i download a XML file from a server using sockets, and doing a request with GET. This is the file:

http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58;msl=70

However when receiving the file it contains the string "008000" in some places of the file, which is not present when opened in my browser, and this string breaks the XML formating. Example:


<location altitude="70" latitude="6 
008000
0.1000" longitude="9.5800">

I used wireshark to see if this was also sent from the server, or if it was created on my side, but it seems the server sends this:

HM4LvS0.jpg

Any ideas on how to fix this?

My code


 
import java.io.DataInputStream;

import java.io.FileNotFoundException;

import java.io.IOException;

import java.io.OutputStream;

import java.io.PrintWriter;

import java.net.*;
public class NetClient

{
    Socket clientSocket = new Socket();

    InetSocketAddress ip = new InetSocketAddress("api.yr.no", 80);

 

    public String GetData(float _latitude, float _longitude, int _msl)

    {

     try

     {

         byte[] data = new byte[65000];

         String translateddata = "";

      

         clientSocket.connect(ip);

         DataInputStream inData = new DataInputStream(clientSocket.getInputStream());

         OutputStream outData = clientSocket.getOutputStream();

        

         PrintWriter pw = new PrintWriter(outData, false);

         pw.print("GET " + "/weatherapi/locationforecast/1.9/?lat=" + _latitude + ";lon=" + _longitude + ";msl=" + _msl +  " HTTP/1.1\r\n");

         pw.print("Host: api.yr.no\r\n");

         pw.print("Accept: text/xml\r\n");

         pw.print("\r\n");

         pw.flush();

        

         Thread.sleep(1000);

        

         int bytesread = 0;

         int i = 0;

         while (bytesread != -1)

         {

          bytesread = inData.read(data);

          if (bytesread != -1)

          {          

           translateddata = translateddata + new String(data);  

           String temp = new String(data);

                 PrintWriter file;

           file = new PrintWriter("weather" + i++ + ".txt");

                 file.write(temp);

                 file.close();
          }

         }

         clientSocket.close();

         return translateddata;

     }

     catch (IOException e)

     {

      

     }

     catch (InterruptedException e)

     {

   e.printStackTrace();

  }

     finally

     {

      

     }
     return "Something went wrong when trying to download data from remote server!";

    }

   

}
 
Advertisement

Break this problem into smaller problems. First just download the file and save it as text, then parse it into the program. If there is a problem you'll be able to figure it out. If everything works correctly, then there may be a problem with the URL you encoded to the server.

Did you send something funky in the URL that got converted to a space in the XML? I tried doing that and the server caught it and returned an error, but you never know, there have been known to be servers with bugs in them. lol

I've used Java and XML for a long time, and never seen random junk injected into the document, so something else is going on. How are you parsing the XML?

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

This is how i parse it:


  NetClient client = new NetClient();
  String temp = client.GetData(60.10F,9.58F,70);

DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
  DocumentBuilder dBuilder;
  try
  {
   dBuilder = dbFactory.newDocumentBuilder();
   InputStream is = new ByteArrayInputStream(temp.getBytes());
   Document doc = dBuilder.parse(is);
  }

And thank you for the fast reply smile.png

I do save it as a txt file aswell, just the pure data received, but it contains the "008000" aswell which is why i checked the packets using wireshark. Perhaps i should try another server aswell and see if it remains.

I think either there is an invalid character in the file somewhere, or there is a problem parsing it. Based on your parsing code I suspect an invalid character.

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

Since you do a direct socket connection, could it be chunked encoding? (http://en.wikipedia.org/wiki/Chunked_transfer_encoding)

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

Since you do a direct socket connection, could it be chunked encoding? (http://en.wikipedia.org/wiki/Chunked_transfer_encoding)

Now that's a good idea... The 8000 value is very suspicious, isn't it.

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

http://www.mkyong.com/java/how-to-send-http-request-getpost-in-java/

Try using the HTTP library code instead of directly reading the bytes, and see if anything is different.

If you want something fancier, I've used this in the past for Restful client code with good results.

http://hc.apache.org/httpclient-3.x/

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

Thank you for the answers. I've uploaded a file here which contains the raw data i receive:

http://speedy.sh/69QPv/weather4.txt

I read your link Endurion and it seems you are correct! I was thinking along these lines in the beginning but when Reading 8000 bytes it landed me in the middle of the XML, however the number was encoded in hexadecimal, which is 32767.

Thank you Glass_Knife and Endurion for your help :)

This topic is closed to new replies.

Advertisement