Web Development: How to Download Files in Python

Thursday, January 11, 2018

How to Download Files in Python

Python provides several ways to download files from the internet. This can be done over HTTP using the urllib package or the requests library. This tutorial will discuss how to use these libraries to download files from URLs using Python.

REQUESTS

The requests library is one of the most popular libraries in Python. Requests allow you to send HTTP/1.1 requests without the need to manually add query strings to your URLs, or form-encode your POST data.

With the requests library, you can perform a lot of functions including:

adding form data,
adding multipart files,
and accessing the response data of Python

MAKING REQUESTS

The first you need to do is to install the library and it's as simple as:

pip install requests

To test if the installation has been successful, you can do a very easy test in your python interpreter by simply typing:

import requests

If the installation has been successful, there will be no errors.

HTTP requests include:

GET
POST
PUT
DELETE
OPTIONS
HEAD

Making a GET request

Making requests is very easy as illustrated below.

import requests
req = requests.get(“http://www.google.com”)

The above command will get the google web page and store the information in thereq variable. We can then go on to get other attributes as well.

For instance, to know if fetching the google web page was successful, we will query the status_code.

import requests
req = requests.get(“http://www.google.com")
req.status_code
200 

# 200 means a successful request

What if we want to find out the encoding type of the Google web page?

req.encoding
ISO-8859–1

You might also want to know the contents of the response.

req.text

This is just a truncated content of the response.

'<!doctype html><html itemscope="" itemtype="http://ift.tt/KaymKU" lang="en
"><head><meta content="Search the world\'s information, including webpages, imag
es, videos and more. Google has many special features to help you find exactly w
hat you\'re looking for." name="description"><meta content="noodp" name="robots"
><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta conten
t="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image
"><title>Google</title><script>(function(){window.google={kEI:\'_Oq7WZT-LIf28QWv

Making a POST Request

In simple terms, a POST request used to create or update data. This is especially used in the submission of forms.

Let's assume you have a registration form that takes an email address and password as input data, when you click on the submit button for registration, the post request will be as shown below.

data = {"email":"info@tutsplus.com",
        "password":"12345")
req = requests.post(“http://www.google.com, params = data)

Making a PUT Request

A PUT request is similar to a POST request. Its used to update data.For instance, the API below shows how to do a PUT request.

data= {"name":"tutsplus",
        "telephone":"12345")
r.put("http://www.contact.com, params= data")

Making a DELETE Request

A DELETE request, like the name suggests, is used to delete data. Below is an example of a DELETE request

data= {'name':'Tutsplus'}
url = "http://ift.tt/2EwdY5u")
response = requests.delete(url, params= data)

urllib Package

urllib is a package that collects several modules for working with URLs namely:

urllib.request for opening and reading URLs.
urllib.error containing the exceptions raised by urllib.request
urllib.parse for parsing URLs.
urllib.robotparser for parsing robots.txt files.

urllib.request offers a very simple interface, in the form of the urlopen function capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling basic authentication, cookies, proxies e.t. c.

How to Fetch URLs With urllib

The simplest way to use urllib.request is as follows:

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

If you wish to retrieve an internet resource and store it, you can do so via the urlretrieve() function.

import urllib.request
filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(filename)

Downloading Images With Python

In this example, we want to download the image available on this link using both the request llibrary and urllib module.

url = 'http://ift.tt/1NJHQZ5'

# downloading with urllib

# imported the urllib library
import urllib

# Copy a network object to a local file
urllib.urlretrieve(url, "python.png")


# downloading with requests

# import the requests library
import requests


# download the url contents in binary format
r = requests.get(url)

# open method to open a file on your system and write the contents
with open("python1.png", "wb") as code:
    code.write(r.content)

Download PDF Files With Python

In this example, we will download a pdf about google trends from this link.

url = 'http://ift.tt/1VOKH8L'

# downloading with urllib

# import the urllib package
import urllib

# Copy a network object to a local file
urllib.urlretrieve(url, "tutorial.pdf")


# downloading with requests

# import the requests library
import requests

# download the file contents in binary format
r = requests.get(url)

# open method to open a file on your system and write the contents
with open("tutorial1.pdf", "wb") as code:
    code.write(r.content)

Download Zip Files With Python

In this example, we are going to download the contents of a GitHub repository found in this link and store the file locally.

url = 'http://ift.tt/2EwAACH'

# downloading with requests

# import the requests library
import requests


# download the file contents in binary format
r = requests.get(url)

# open method to open a file on your system and write the contents
with open("minemaster1.zip", "wb") as code:
    code.write(r.content)


# downloading with urllib

# import the urllib library
import urllib

# Copy a network object to a local file
urllib.urlretrieve(url, "minemaster.zip")

Download Videos With Python

In this example, we want to download the video lecture available on this page

url = 'https://www.youtube.com/watch?v=aDwCCUfNFug'
video_name = url.split('/')[-1]

# using requests

# imported the requests library
import requests

print "Downloading file:%s" % video_name

# download the url contents in binary format
r = requests.get(url)

# open method to open a file on your system and write the contents
with open('tutorial.mp4', 'wb') as f:
    f.write(r.content)


# using urllib

# imported the urllib library
import urllib
print "Downloading file:%s" % video_name

# Copy a network object to a local file
urllib.urlretrieve(url, "tutorial2.mp4")

Conclusion

This tutorial has covered the most commonly used methods to download files as well as the most common file formats. Even though you will write less code when using the urllib module, the requests module is preferred due to its simplicity, popularity and a wide array of features including:

Keep-Alive & Connection Pooling
International Domains and URLs
Sessions with Cookie Persistence
Browser-style SSL Verification
Automatic Content Decoding
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Unicode Response Bodies
HTTP(S) Proxy Support
Multipart File Uploads
Streaming Downloads
Connection Timeouts
Chunked Requests
.netrc Support

1 comment:

gary madAugust 16, 2019 at 9:10 PM
setupwirelessprinter
ReplyDelete
Replies

Add comment