GraphLab is a Python library that gives many out of the box features to use. It is a great library to learn the Machine Learning foundations. Many courses out there teaches several algorithms with bunch of tools, and non real world examples. However if you are new to Machine Learning, GraphLab(powered by DATO) is a great library to start.
In this post I’ll try to give some intuition about SFrames and I’ll show some simple data visualization examples using iPython Notebooks.
First go ahead and download GraphLab Create from https://dato.com/products/create/ . If you are a student you can use GraphLab Create for 1 year at no charge for academic purposes. https://dato.com/download/academic.html . After downloading and installing GraphLab Create, launch iPython notebooks. Also here is a simple data set that I’ll use for the rest of my post. people-example.csv
You will have an output very similar to below
Now we have loaded our data and let’s start with basics.
Here is my output;
sf.head() function will also fetch the few lines from the beginning of the file. You can also use sf.tail() function to retrieve few lines of data from the end of the file. However because we don’t have that much records in our dataset, the output of those 3 functions will be the same.
Graph Lab Canvas is a built in visualization tool that comes with GraphLab Create.
You will have an output which will redirect you to the Canvas web application.
Here is my output;
You can click on each column and see the most frequent items. Also in Table view you can view your data in a clean and very nicely structured way. SFrames are not storing the data in memory. So you may even view 1 billion of rows in GraphLab Canvas.
Here are some more simple operations;
Create new columns in our SFrame
This code will create a new column that consists of the First Name and the Last Name columns.
If you noticed in our Country column we have United States for some rows and USA for some other rows. We can write a function and and use it in a for loop to fix this problem for each row. However there is a more clean and neat way to to this in GraphLab Canvas.
Advanced Transformation of our Data
Let’s write a function that will change ‘USA’ to ‘United States’.
Now in a next line if you try
But if you try
Let’s apply this to all the rows in our dataset.
Now print out our data set again by typing sf.
Now we have cleaned our data and added a new row very easily using GraphLab Canvas!