Requests: HTTP for Pythonistas

This post is about an awesome python library, Requests.

1


Requests is an elegant and simple HTTP library for Python.
It is one of the most fundamental libraries used by python programmers to interact with web!

Before starting to explore this awesome library, let’s have a look at its awesomeness: 😉

Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month.All the cool kids are doing it!
Recreational use of other HTTP libraries may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death.
Requests is the only Non-GMO HTTP library for Python, safe for human consumption.
Python HTTP: When in doubt, or when not in doubt, use Requests. Beautiful, simple, Pythonic.

Believe me, the requests library is all worthy of the appreciation quoted above!
So, let’s get started! 🙂

First of all download and setup the requests library using following command:

pip install requests

1.Retrieve HTML content

One of the most basic tasks of the requests library is to retrieve the HTML content from a webpage using its URL.

Here is an example:

import requests

URL = "https://en.wikipedia.org/wiki/Python_(programming_language)"
r = requests.get(URL)

print(r.content)

Ok, so let’s try to understand this code:

  • import requests

    This statement imports the request library so that it could be used in our python program.

  • URL = "https://en.wikipedia.org/wiki/Python_(programming_language)"

    Here we have defined the URL of the webpage. (You are free to define the URL of any other website here)

  • r = requests.get(URL)

    Here comes the most important part. We are simply sending a HTTP request to the server. (A HTTP GET request, to be precise. A GET request is used to retrieve information from the server.)
    What this method returns is a simple “Response Object” which is saved as r.
    This object contains all the info we need about webpage.

  • print(r.content)

    r.content contains the raw HTML content of the webpage. It is of ‘string’ type.
    If you want the HTML content in unicode, try this :

    print(r.text)

    It is worth remembering that:
    r.text is the content of the response in unicode, and r.content is the content of the response in bytes.
    And it is the latter one which is required the most.

Other than HTML content, we can also get other useful information about the webpage, using this attribute:

print(r.headers)

And you get a dictionary of header details. It will look something like this:

{'Content-Length': '65948', 
 'Content-language': 'en', 
 'Last-Modified': 'Sun, 16 Oct 2016 21:35:05 GMT', 
 'X-Client-IP': '59.180.41.18', 
 'Date': 'Mon, 17 Oct 2016 13:00:07 GMT', 
 'Accept-Ranges': 'bytes', 
 'Age': '55481', 
 'Server': 'mw1242.eqiad.wmnet', 
 'Connection': 'keep-alive', 
 'Content-Encoding': 'gzip',
 'Vary': 'Accept-Encoding,Cookie,Authorization',
 'Cache-Control': 'private, s-maxage=0, max-age=0, must-revalidate',
 'Content-Type': 'text/html; charset=UTF-8'}

What next?
Once we have accessed the HTML content of the webpage, we can easily extract useful data from it. But since the HTML content is a plain string, we can’t collect data under various HTML tags easily.What we need is, a method to convert HTML string into a nested structure  according to various HTML tags.
The technique is called web scraping.(To be covered in upcoming blog posts)


2. Interact with API

This is really fun!
Requests module makes interaction with various APIs super easy! 🙂
First of all, what is an API?
You can do a simple Google Search, right? 😉
What is important for you to understand is that an API enables you to access the internal features of a program in a limited fashion. And in most cases, the data provided is in JSON(JavaScript Object Notation)  format (which is implemented as dictionary objects in Python!).

Let’s start with an example:

import requests

API_ENDPOINT = "http://api.openweathermap.org/data/2.5/weather"
API_KEY = "XXXXXXXXXXXXXXXX"  #Your API KEY HERE

place = raw_input("Enter your city name:")

params = {'q':place,
          'appid':API_KEY,
          'mode':'json'
         }
r = requests.get(API_ENDPOINT, params = params)
myweather = r.json()

We are using the OpenWeatherMap API here.
To access the API, you will need to generate an API key by signing up here.

Okay, let’s start analyzing the python code now.

  • API_ENDPOINT = "http://api.openweathermap.org/data/2.5/weather"
    API_KEY = "XXXXXXXXXXXXXXXX"  #Your API KEY HERE

Here we are defining the URL at which we will call the API. We are defining it as           API_ENDPOINT.
Generate your API key for free account and save it as a string called API_KEY.

  • place = raw_input("Enter your city name:")

    Nothing special here. Just prompting the user to enter a city name.

  • params = {'q':place,
              'appid':API_KEY,
              'mode':'json'    #available modes are json,html,xml
              }

    Now, here comes the important part. We are defining a dictionary called params which is having following keys: q, appid, mode. These are the info we will be passing to the API.
    You have to stick with these key names because the API accepts parameters with only these names. Try to open this URL in your web browser and you would know. 🙂

     http://api.openweathermap.org/data/2.5/weather?q=New+Delhi&appid={Your API key here}&mode=json

    q is the query field where you are supposed to pass the city name.
    appid is your API key
    mode is the format of the API response. This weather API allows 3 modes:json, html, and xml.(The most convenient mode for data parsing in python is json, so we will go for it)

  • r = requests.get(API_ENDPOINT, params = params)

    And here we create the response object, r.
    A HTTP GET request is being made to the API servers.
    Notice the one extra parameter we are passing this time, params.
    It contains all the parameters we want to pass to the API.
    params will form the query string of the URL.
    Do this:

    print r.url

    You will get a string like this:

     http://api.openweathermap.org/data/2.5/weather?q={place}&appid={Your API key here}&mode=json

    So, the requests library is basically creating the actual URL, by adding parameters defined in params, to the API_ENDPOINT, separated by ‘&’.

    One can argue that he/she could have simply done something like this: 😉

     URL = "http://api.openweathermap.org/data/2.5/weather?q=%s&appid=%s&mode=json"%(place,API_KEY)
     r = requests.get(URL)

    But believe me, going by former method makes the code much more readable and easier to maintain.

  • myweather = r.json()

    Ok, once we have obtained the response from API server, we need to access our data, right!
    Remember r.content? It was simply a plain string and one would struggle to extract and format useful data from it.
    Requests library again comes to your rescue! 🙂
    r.json() method will convert json response content into a dictionary which we have defined here as myweather.
    Now you can “pythonically” access the data. 🙂
    Here is an example:

    temp = myweather['main']['temp']
    humidity = myweather['main']['humidity']
    pressure = myweather['main']['pressure']
    description = myweather['weather'][0]['description']
    print(temp,humidity,pressure,description)
    

So, this was an example of how we can access web APIs using the requests library!

But that’s not all though!
There are various APIs which allow users to pass data to them as well!
Let’s have a look at it in next section. 🙂


3. Making a POST request

By this time, It must be clear to you that:
The HTTP GET request method is designed to retrieve information from the server.

Now, let’s discuss the HTTP POST request method a bit.
The POST request method requests that a web server accept and store the data enclosed in the body of the request message.

Let us see how it is done using following example:

import requests

API_ENDPOINT = "http://pastebin.com/api/api_post.php"
API_KEY = "XXXXXXXXXXXXXXXXX" #Your API key here

source_code =raw_input("Enter your code here:")  #your source code here

data = {'api_dev_key':API_KEY,
'api_option':'paste',
'api_paste_code':source_code}

r = requests.post(API_ENDPOINT,data =data)

pastebin_url = r.text
print("The pastebin URL is:%s"%pastebin_url)

This example uses pastebin API to paste your source_code to pastebin.com .
First of all, you will need to generate an API key by signing up here and then access your API key here

Here again, we will need to pass some data to API server, which is passed as data argument this time.
And we use requests.post method this time.
In response, the server processes the data sent to it and sends the pastebin URL of your source_code which can be simply accessed by r.text .  

requests.post  method could be used for many other tasks as well like filling and submitting the web forms, posting on your fb timeline using the Facebook Graph API, etc.


4. Downloading files from web

This is another interesting application of the Requests library.
Given a URL of any content on web, we could easily download it using Requests library.

Let’s see how it can be done:

import requests

image_url = "https://www.python.org/static/community_logos/python-logo-master-v3-TM.png"
r = requests.get(image_url)

with open("python_logo.png","wb") as f:
           f.write(r.content)

The small piece of code written above will download the image from the web.
All we need is the URL of the image source. (You can get the URL of image source by right-clicking on the image and selecting the View Image option.)

Remember what I had mentioned about r.content before? It is the content of the response in bytes! So, writing the content in a  new file will simply download the Image.

And this method holds for any type of files.
You can also use this method to download large files(PDFs, mp3 files, etc.).

But there is an issue which is worth discussing here. 🙂
As you know that r.content is a single string which is storing the file data, it won’t be possible to save all the data in a single string in case of large files!
The solution is to read data in chunks! And requests library allows us to achieve this using a r.iter_content method.
Also, loading response body all at once will lead to sucking a hell lot of memory.
We need a method to load data from server in chunks as well! And this problem is solved this way:

                   r = requests.get(URL, stream = True)

Setting stream parameter to True will cause the download of response headers only and the connection remains open. This avoids reading the content all at once into memory for large responses. A fixed chunk will be loaded each time while r.iter_content is iterated.

Let’s go through an example here:

import requests

file_url = "http://www.tutorialspoint.com/python/python_tutorial.pdf"

r = requests.get(file_url, stream = True)

with open("python.pdf","wb") as pdf:
          for chunk in r.iter_content(chunk_size=1024):
                   if chunk:
                        pdf.write(chunk)

The code seems self-explanatory after the discussion above. 🙂

So, this was a much more efficient method to download large files from web using requests module.


Well, it was a pretty exhaustive review of the Requests library at beginner level.
There is still much more to it!
And the best resource to explore this library in detail is its documentation itself 🙂 :
Request:HTTP for Humans

Any suggestions, queries regarding this post are welcome!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s