Jump to content

View more

Image of the Day

Boxes as reward for our ranking mode. ヾ(☆▽☆)
#indiedev #gamedev #gameart #screenshotsaturday https://t.co/ALF1InmM7K
IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.


Sign up now

urllib2 - Website loads fine in browser but not in Python

4: Adsense

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.


  • You cannot reply to this topic
3 replies to this topic

#1 TheComet   Members   

3858
Like
0Likes
Like

Posted 08 July 2014 - 05:20 AM

I'm trying to download a page from this website: http://www.digikey.de

If I click on the link from within Firefox, it loads fine.

 

If I try to download the page using urllib2, I keep getting this message:

    <span id="ctl00_mainContentPlaceHolder_lblInvalidRequest"><H2>There was a pr
oblem with your request.</H2>

We are unable to process your request.<br/> Please return to the previous page t
o try again or contact <a href="mailto:webmaster@digikey.com?subject=Incident Nu
mber: 18&#46;ce969d50&#46;1404818243&#46;20b8e71">Digi-Key Webmaster</a> if you
feel that you have received this message in error. Please reference the followin
g incident number so we may assist you with this error.
<br/><br/

Here's the code:

import urllib2
import sys

if __name__ == '__main__':

    # optional proxy
    if len(sys.argv) > 1:
        proxy = {'http': str(sys.argv[1])}
        proxy = urllib2.ProxyHandler(proxy)
        opener = urllib2.build_opener(proxy)
        urllib2.install_opener(opener)

    html = urllib2.urlopen('http://www.digikey.de').read()
    error = html.find('request')
    if not error == -1:
        print html[error-400:error+400]
    else:
        print 'success'

Can anyone explain to me why it's doing this and how I can fix it?


"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

#2 Key_46   Members   

540
Like
1Likes
Like

Posted 08 July 2014 - 09:14 AM

In my case the webserver was not accepting the default User Agent from urllib2. Try changing it:

 

opener.addheaders = [('User-agent', 'Mozilla/5.0')]



#3 TheComet   Members   

3858
Like
0Likes
Like

Posted 08 July 2014 - 09:39 AM

Yes, that worked wonderfully. Thanks!

 

For the future, how were you able to diagnose what the problem was?


"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

#4 Key_46   Members   

540
Like
0Likes
Like

Posted 08 July 2014 - 09:46 AM

Yes, that worked wonderfully. Thanks!

 

For the future, how were you able to diagnose what the problem was?

Since the request was working on my browser I tried to match the header exactly. I enabled the debug verbose for the opener to see what it was sending:

 

opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))






Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.