Cineworld API

So after working on commercial software for a few years, I really fancied getting my feet wet in some open source software again, I haven’t had too much time for this, but when I found out my favorite cinema had their own API I couldn’t resist making a wrapper for it.

It only took a couple of hours but it was quite fun to make. I tried to make up for missing information supplied by the API by introducing some simple hacks to get the information a different way. I’ll go through my methodology in this blog post as well as some simple use cases. So here it is the Cineworld API Wrapper written in Python

First of there are two mode of operations regarding the API key, you can either store it in a file (which I suggest) or you can use it as an argument into the main class. Here is a simple search method for finding a film:

Without saving your API key in the cineworld_api_key.py file:

from cineworld import CW
CW('my_api_key').film_search('some movie here')

With your API key saved:

from cineworld import CW
CW().film_search('some movie here')

I’d like to mention that the cineworld API doesn’t have a way to search using film titles so I had to make my own using their list of films. But before I get into explaining that I would like to go through some of the easy functions that just directly link to the API.

# base function for connecting to API
def get_list(self, datatype, url, **kwargs):
    search_url = [url, '?']
    kwargs.update({'key': self.api_key})
    search_url.append(urlencode(kwargs))
    data = json.loads(urlopen(''.join(search_url)).read())
    return data[datatype]

# gets a list of all cineworld cinemas and allows further customization
# of the list using arguments located in the API documentation
def get_cinemas(self, **kwargs):
  return self.get_list('cinemas', self.cinemas_url, **kwargs)

# gets a list of all films currently playing in cineworld cinemas and allows
# further customization of the list using arguments located in the API documentation
def get_films(self, **kwargs):
  return self.get_list('films', self.films_url, **kwargs)

# cache the result of the list of films in case of
# multiple searching on the same object
def get_film_list(self):
    self.film_list = self.get_films()
    return self.film_list

# gets a list of all dates when films are playing at cineworld cinemas and allows
# further customization of the list using arguments located in the API documentation
def get_dates(self, **kwargs):
  return self.get_list('dates', self.dates_url, **kwargs)

# not well documented but I assume it's for more specialized performances i.e. not films
def get_performances(self, **kwargs):
  return self.get_list('performances', self.performances_url, **kwargs)

Each function above accesses their respective API call and allows the user to specify their own arguments using the **kwargs parameter. These are the only functions to actually call the API and the rest of the functionality is completely based on these. A unique identifier can be passed along with the get_cinemas and get_films function to allow us to get specific film information.

Now the problem I have is how do I get specific films if a user wanted one, they would have to first call the get_films function and then manually pick out their film. The list returned from the films API does contain a unique identifier for each film called the EDI number. However, EDI numbers are not as popular as say imdb or the tmdb movie identification number. The only way to get the specific film information is to look at the name and extract its EDI number. The problem was that the name of the films could be slightly different from imdb or tmdb so I couldn’t do an exact string match. They could be shortened versions of the film names or additional identifiers such as 3D appended onto the name.

So I pretty much had to implement my own search functionality using the film list generated by cineworld as the base information. Now performance wasn’t really an issue as there were only at most about 20 films in the list, so I didn’t need hardcore performance search indexers like Lucene and I could easily use python for my search.

What I needed was a fuzzy searching alogirthm, luckily I had just read a seatgeek blog post with a really great introductory tutorial on fuzzy searching. They aslo released the code they used as an opensource python library amazingly called fuzzywuzzy. It provided me with an easy way to match a good representation a film title with the film name that cineworld would give me.

from fuzzywuzzy.fuzz import WRatio
from operator import itemgetter

def film_search(self, title):
  films = []
  # check for cache or update
  if not hasattr(self, 'film_list'):
      self.get_film_list()
  # iterate over films and check for fuzzy string match
  for film in self.film_list:
      strength = WRatio(title, film['title'])
      if  strength > 80:
          film.update({u'strength':strength})
          films.append(film)
  # sort films by the strength of the fuzzy string match
  films_sorted = sorted(films, key=itemgetter('strength'), reverse = True)
  return films_sorted

So this function will return a list of films ordered by the strength of the fuzzy string match, generally we will only need the top results, possible the second result when the string matches a film name with both its 2D and 3D version. So this is going to be the main function to get hold of a film’s id and then use that to find show times etc.

I needed a way to get the current box office films to place on a site, unfortunately Cineworld have quite a few unorthodox films that play sometimes, like kids cartoons on Saturday morning and Bollywood films on a Thursday. Not that those films aren’t important but I wouldn’t say they were really Box Office films. Wednesday on the other hand, being Orange Wednesdays, generally had all of the Box Office films playing. Also, the way the Cineworld site seemed to work was that looking forward to Wednesday was a good way to get the most up to date films. So I made a function that would look forward to the next Wednesday and return a list of films playing that night. Also, I picked a single cinema which was likely to have quite a large amount of films due to its size, the O2 in Greenwich. Finally I made sure that I wouldn’t get both the 3D and the 2D version of the film and it would only return the film name once using a simple filter to remove any 3D films and then removing the 2D text at the beginning of the string.

# uses a certain cinema (O2) and a certain day when non specialist films
# show (Wednesday) to get a list of the latest box office films
def get_box_office_films(self):
  today = datetime.date.today()
  next_wednesday = (today + datetime.timedelta((2 - today.weekday()) % 7))
  .strftime('%Y%m%d')
  films = self.get_films(cinema=79, date = next_wednesday)

  films = filter(lambda x: '3D' not in x['title'], films)
  for film in films:
      if '2D -' in film['title']:
          film['title']=film['title'][5:]
  return films

It’s pretty easy to see that using quite a basic API you can create a lot more functionality with only making the slightest of approximations.

Comments