Jump to content

  • Log In with Google      Sign In   
  • Create Account

urllib2 - Website loads fine in browser but not in Python


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 TheComet   Members   -  Reputation: 1568

Like
0Likes
Like

Posted 08 July 2014 - 05:20 AM

I'm trying to download a page from this website: http://www.digikey.de

If I click on the link from within Firefox, it loads fine.

 

If I try to download the page using urllib2, I keep getting this message:

    <span id="ctl00_mainContentPlaceHolder_lblInvalidRequest"><H2>There was a pr
oblem with your request.</H2>

We are unable to process your request.<br/> Please return to the previous page t
o try again or contact <a href="mailto:webmaster@digikey.com?subject=Incident Nu
mber: 18&#46;ce969d50&#46;1404818243&#46;20b8e71">Digi-Key Webmaster</a> if you
feel that you have received this message in error. Please reference the followin
g incident number so we may assist you with this error.
<br/><br/

Here's the code:

import urllib2
import sys

if __name__ == '__main__':

    # optional proxy
    if len(sys.argv) > 1:
        proxy = {'http': str(sys.argv[1])}
        proxy = urllib2.ProxyHandler(proxy)
        opener = urllib2.build_opener(proxy)
        urllib2.install_opener(opener)

    html = urllib2.urlopen('http://www.digikey.de').read()
    error = html.find('request')
    if not error == -1:
        print html[error-400:error+400]
    else:
        print 'success'

Can anyone explain to me why it's doing this and how I can fix it?


YOUR_OPINION >/dev/null


Sponsor:

#2 Key_46   Members   -  Reputation: 418

Like
1Likes
Like

Posted 08 July 2014 - 09:14 AM

In my case the webserver was not accepting the default User Agent from urllib2. Try changing it:

 

opener.addheaders = [('User-agent', 'Mozilla/5.0')]



#3 TheComet   Members   -  Reputation: 1568

Like
0Likes
Like

Posted 08 July 2014 - 09:39 AM

Yes, that worked wonderfully. Thanks!

 

For the future, how were you able to diagnose what the problem was?


YOUR_OPINION >/dev/null


#4 Key_46   Members   -  Reputation: 418

Like
0Likes
Like

Posted 08 July 2014 - 09:46 AM

Yes, that worked wonderfully. Thanks!

 

For the future, how were you able to diagnose what the problem was?

Since the request was working on my browser I tried to match the header exactly. I enabled the debug verbose for the opener to see what it was sending:

 

opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS