The Python ecosystem for beginners, part 2

Welcome to Part 2 of my post on the scientific Python ecosystem (Part 1 is here).   I will describe a few more of the most common and useful libraries that make up the typical Python scientific computing stack.   This is not an exhaustive list by any means, and new libraries are being continually developed by the open source community.

Matplotlib – high quality 2D and 3D plotting

Matplotlib_3d

Matplotlib is a plotting library that aims to make it easy to produce publication quality plots.  In typical Python style, Matplotlib code can be very succinct and yet yield complete, high-quality plots.  The library can generate many types of 2D graphs: regular plots, histograms, scatterplots, pie charts, statistical plots, and contour plots, to name a few.

Matplotlib is organized in a hierarchical manner that allows the user to quickly and easily create plots using high-level commands, while simultaneously allowing power users to delve into the object-oriented programming layer to control minute details of individual plots, should they choose to do so.

Traits – interactive class instances and GUI building

Traits is a powerful package that extends Python type attributes in interesting and useful ways.  For instance, python objects such as classes can have attribute “traits”  that allow for initialization (set an default value), notification (tell another part of the program that a value has changed) and visualization (respond to GUI inputs).   Although it is possible to achieve this using Python properties, Traits reduces a lot of the boilerplate code and streamlines the process.

Chaco – interactive 2D plotting

Chaco is a plotting application toolkit for building rich, interactive plots.   Chaco works with Traits to build object-oriented models of plots that can accept and react to inputs from the GUI.

Cython – speed up your code with C

The easiest way to think about Cython is to imagine it as a superset of the Python language.   That is, all of the normal Python language is there, along with additional commands that allow code that calls back and forth to C/C++ libraries seamlessly.  In Cython, you can also add static type declarations to python functions to get C-level speedups in computation.  Cython code is compiled into C code for execution.  Unlike weave, which allows inline C code but requires that the python code be re-compiled for C during every execution, Cython code is compiled only once (unless there are changes later) meaning that an end user does need to bother with recompiling to run the code as a standalone program.

Using Cython for numerical computation in Python, speedups of 2000X or more above the pure Python equivalent are not uncommon.

SciKit Learn – interactive machine learning

SciKit Learn is a machine-learning library for Python.  It is based on NumPy, SciPy and matplotlib.  There are many algorithms available for performing machine-learning tasks, falling into four main areas: classification, clustering, regression, and dimensionality reduction (principle component analysis).