HomePage of Laszlo SZATHMARY | Python / Download an Image

#############################

Here we will see how to:

download an image with urllib
download an image with urllib2
download a protected image with urllib2 and wget
get cookies from Firefox 3 and use cookies.txt with wget

Download an image with urllib

Simple:

import urllib

urllib.urlretrieve("http://japancsaj.com/pic/susu/Susu_2.jpg", "out.jpg")

Fancy:

import urllib

class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'

#create an opener, so we can change its user-agent
urlopen = MyOpener().open
urlretrieve = MyOpener().retrieve

urllib.urlretrieve("http://japancsaj.com/pic/susu/Susu_2.jpg", "out.jpg")

This latter tip is from the project manwer.

Download an image with urllib2

import urllib2

req = urllib2.Request("http://japancsaj.com/pic/susu/Susu_2.jpg")
response = urllib2.urlopen(req)

output = open('out.jpg','wb')
output.write(response.read())
output.close()

Download a protected image with urllib2

There is a nice site at http://japancsaj.blog.hu/ with great photos. However, if you want to download an image from outside of the browser, you get an HTML file instead of the image. The trick is that you need to specify a referer:

import urllib2

referer = 'http://japancsaj.com'
image   = 'http://japancsaj.com/pic/susu/Susu_2.jpg'

req = urllib2.Request(image)
req.add_header('Referer', referer)   # here is the trick
response = urllib2.urlopen(req)

output = open('out.jpg','wb')
output.write(response.read())
output.close()

The same thing with wget:

wget  --referer=http://japancsaj.com   http://japancsaj.com/pic/susu/Susu_2.jpg

Get cookies from Firefox 3 and use cookies.txt with wget

To solve the japancsaj mystery, first I tried to combine cookies with wget. The real solution at the end was to use a referer (as seen before), but I'll share what I learnt about cookies, maybe it'll be handy one day.

So, to get cookies from Firefox 3, you can use this script (copied from http://0x7be.de/2008/06/19/firefox-3-und-cookiestxt/):

import sqlite3 as db
import sys

cookiedb = '/home/USENAME/.mozilla/firefox/PROFIL/cookies.sqlite'
targetfile = '/home/USERNAME/cookies.txt'
what = sys.argv[1]
connection = db.connect(cookiedb)
cursor = connection.cursor()
contents = "host, path, isSecure, expiry, name, value"

cursor.execute("SELECT " +contents+ " FROM moz_cookies WHERE host LIKE '%" 
               +what+ "%'")

file = open(targetfile, 'w')
index = 0
for row in cursor.fetchall():
  file.write("%s\tTRUE\t%s\t%s\t%d\t%s\t%s\n" % (row[0], row[1],
             str(bool(row[2])).upper(), row[3], str(row[4]), str(row[5])))
  index += 1

print "Gesucht nach: %s" % what
print "Exportiert: %d" % index

file.close()
connection.close()

Normally, you could also use a Firefox plugin for this purpose called Export Cookies but it doesn't seem to work with Firefox 3.5. Fortunately we have this nice Python solution. Next step:

./export-firefox-cookies.py japancsaj

It'll produce a file called cookies.txt that contains all the cookies that are from the site 'japancsaj'.

Then:

wget --load-cookies=cookie.txt http://what_to_download

Or, you can also ask wget to produce the cookie.txt file:

wget --cookies=on --keep-session-cookies --save-cookies=cookie.txt http://first_page
wget --referer=http://first_page --cookies=on --load-cookies=cookie.txt http://second_page