Introduction to Processing

In this tutorial we will learn how to load data from an external file into Processing and create a graph of the 10 most populated cities in the world.

Introduction
Originally created as a tool to teach programming to visual artists, Processing language now has applications in several areas, including data visualization. Its simplified syntax is a good alternative for beginners to the coding world, and its open source IDE can be downloaded for free.

Processing Environment
When you open the program, you will find something like a text editor. That’s where we will type commands – no, there is not a “toolbar” here, since the original goal of Processing is to teach programming fundamentals.

To run the script that we are going to write, just click on the play button at the left top corner of the window. A new window will open, empty for now, since we haven’t created any commands yet. Close it (or press the stop button) and save your file before you start programming – by default its name is the creation date, but you can choose any other name.

Basic Commands
To begin to understand Processing syntax, type the code below and run the program:

size(1200, 200);
ellipse(600, 100, 100, 100);
You will see a window like this:

What does this mean?

These commands are called functions. We use two of them: size and ellipse. The first defines the size of the document we are going to work with (in pixels). The second draws an ellipse. Beside each one, inside the parentheses, we have the parameters: information that the program needs to know for how it will perform each function. In our case:

size (width of the document, height of the document);
ellipse (horizontal coordinate of the ellipse, vertical coordinate of the ellipse, width of the ellipse, height of the ellipse);

It means that it’s possible to translate the second line as “draw an ellipse 400 pixels from the left edge and 100 pixels from the top, 100 pixels high and 100 pixels wide”. Try to change these numbers to better understand how they work.

At last, some details we need to be aware of:

  • parameters must always be separated by commas (white space makes no difference);
  • we always need a ; after each command.
  • It doesn’t matter if you have a space or a paragraph between these symbols, but they must be there!

If you’re curious and want to try other ways before going ahead, take a look at the following commands:

line()
point()
rect()
triangle()

You will find instructions for each one on the Processing reference web site.

Formatting data

We will use data from the 2013 Demographia World Urban Areasto draw a graph ranking the 10 most populated cities in the world (table 1).

A simple and practical data format that can be read in Processing is the tsv (tab-separated values). Basically, it is a text format that works like a table, with values ​​separated by tabs instead of cells.

Transferring data from a PDF file into a table is not always an easy task, since the breaks hardly follow the format of the cells that we need.

In the image above, we will only need the columns Urban AreaPopulation Estimate e Land Area km2. You can download the formatted file here or try to create your own.

  • Windows: Use Notepad and save the file as “cities.tsv”.
  • Mac: Use the Text Edit. Press command + shift + T to transform the file into unformatted text. When you save, select Unicode (UTF-8) in the encoding option.

    Loading data

    Drag the file to the Processing window and you will see the message One file added to the sketch. This just created a copy of cities.tsv in the folder that Processing created for you. To read it in the script, add the following code and run:

    size(1200, 200);
    Table myTable = loadTable("cities.tsv", "header");

    for (int i = 0;  i < myTable.getRowCount(); i = i + 1)  {
        TableRow line =myTable.getRow(i);
        rect(120*i, 50, line.getInt("area"), line.getInt("area"));
    }

    Viewing area

    Ok, what is this code doing? Let’s understand line by line:

    Table myTable = loadTable("cities.tsv", "header");
    // Loading the file into a table. “header” means that the first line of the file should be understood as a header.
    // Creates an action that will be repeated several times
    // increase our counter by one each time.
    for (int i = 0;  i < myTable.getRowCount(); i = i + 1)  {
    // In short, it is as if the program started counting “0, 1, 2, 3…” up to “10”, which is the number of rows in our table. For each time it counts, it performs the action below:
    // Select the row corresponding to the count (i)

    TableRow line =myTable.getRow(i);

    // It draws a rectangle at position y 50 and x according to the formula “120 times counting”.
    // It means that, in the sequence, coordinate x will be: 0, 120, 240, 360 … to 1080.
    // As our count of 10 elements starts at 0, it goes up to 9! rect(120*i, 50, It defines width and height of each rectangle the value that is read in the “area”

    rect(120*i, 50, line.getInt("area"), line.getInt("area"));

    In the resulting image, we can’t see the whole rectangles, because the numbers ​​of the area we have for each city are very large:

    A good solution to solve this is to change the command of the rectangle to:

    rect(120*i, 50,sqrt(line.getInt("area")), sqrt(line.getInt("area")));

    The sqrt function calculates the square root of a value. For us, it’s a quick and useful way to reduce the sizes:

Additionally, this patch makes the areasof rectangles proportional to the values ​​that we are representing, not their height and width. This is necessary to avoid distortions when we use a figure like a circle or a rectangle. If we only had to divide the area value by 100, for example, we would have a disproportionate image:

rect(120*i, 50, line.getInt("area")/100, line.getInt("area")/100);

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.