User Tools

Site Tools


labs:sqlalchemy

This is an old revision of the document!


Python and SQLAlchemy

In this lab you will get to know the basics of Python and some of its most useful libraries. After completing this lab, you will be awarded the title of SQL Alchemist.

Getting Started

Ubuntu comes pre-installed with Python 2 and 3. It can also easily be installed using apt-get. However, these are system wide installations that are used by all Python programs. Since Python heavily relies on external libraries, and not all libraries are compatible with all versions of Python, it is often necessary to use different versions of Python for different projects. Thus, the use of Virtual Environments is highly encouraged. Every virtual environment has its own Python distributions and installed libraries. Thus, different projects can be cleanly separated. There are several tools to create and manage virtual environments. In this lab we will use Anaconda.

Follow the link to figure out how to create and activate a new virtual environment. Specify the Python version to be 3.6 and install the following libraries using pip:

  • sqlalchemy
  • sqlite3
  • pandas
  • numpy
  • matplotlib

The command you need to use is:

 conda create --name labEnvironment python=3.6 sqlalchemy sqlite pandas numpy matplotlib

When programming with Python, one has the choice to use IDEs (Integrated Development Enviroments), text editors, an interactive python interpreter or more advanced tools like Jupyter Notebooks. In this lab, we will use the latter.

Let's verify that everything is working.

  1. Open a terminal window and activate your virtual environment
     source activate labEnvironment 
  2. List all installed packages and verify that sqlalchemy, sqlite3, pandas, numpy and matplotlib are instaleld
    conda list
  3. Check that the Python version of the virtual environment is indeed 3.6
    python -V
  4. Open the interactive Python interpreter
    ipython
  5. Print something
    print("Something")

If everything is set up correctly, you should see the print statement's output in the terminal.

  1. Open a text editor
  2. Add a print statement
    print("Something Else")
  3. Save the file as print.py
  4. Inside the terminal, navigate to the location where you saved print.py and execute the script
    python print.py

If everything is set up correctly, you should see the print statement's output in the terminal.

Now that all is set up, you are ready for the next step.

Python Quickstart

Python is easy to learn, especially if you already know some programming. There are no pointers, you do not need to specify explicit types, no need to compile, etc. Many beginners start learning programming with Python for these and other reasons.

In order to help you getting started, we will go through some of the basics together.

Files of Python code are called scripts, since Python is a scripting language. The file extension is .py. In this Lab we will mostly use a standard text editor to edit our Python script. As with other programming languages, there exist IDEs (Integrated Development Environments) such as Eclipse for Java and C++. There are also many more advanced text editors such as Sublime Text that use Extensions to make Python programming easier and even allow the execution of scripts directly from the editor. Furthermore, there exists Jupyter Notebook, which basically provides a web interface for your Python programming, where you can write code and documentation and also browse your data in a convenient way. For many applications, this is the way to go. However, we are just getting started with Python and will thus stick to the basics.

With a barebones editor such as gedit, we need to open a terminal/command line and use following command to run a Python script:

python myFirstScript.py

Very often when programming, we get confused by all the different data types, or we just want to try if an algorithm actually works. This involves a lot of trial and error. It can be annoying to re-run a Python script every time we change a little bit of code just to see another runtime error. During this prototyping/trial-and-error stage, it is often convenient to use an interactive Python shell. In the Linux command line, type the following command to open the Python shell:

ipython

Now, you can just type Python code. The cool thing is that whenever you make a mistake, i.e., there is a runtime error, you will see that something went wrong, but the interactive shell will stay open and all the code you have typed up to that point is still in memory. Thus, all variables are still assigned and you can just try again.

Lets create a list with some numbers:

myList = [1,2,3,4]

Access the first element of the list:

myList[0]

Now, try to access a non-existing element:

myList[5]

You will see an IndexError. However, you can now just continue using your list without having to start all over.

myList[2]

Python's core has some useful functions, for example the print() method. We can print a string as follows:

print("blabla") or print('blabla')

We can also concatenate strings and numbers and print them together:

numDogs=5
print("I have ", numDogs, " dogs.")

Python does not use parantheses {} to delimit code blocks, but instead relies on indendation. The standard is to use one tab or four white spaces. The two indendation methods should not be mixed. Let's create a simple if statement. You have to press enter after every line. After the last line, you need to press enter twice in order to execute the code.

numNeighbourDogs = 10 
numMyDogs = 2
if (numMyDogs > numNeighbourDogs):
    print("Muahaha I win!")
else:
    print("Sh*t, I need to get more dogs!")

When you press enter after if (numMyDogs > numNeighbourDogs):, iPython will create the correct indent itself. However, after print(“Muahaha I win!”), you need to press backspace four times to delete the indent, since if and else need to be on the same indendation-level.

Now, let's try to use different indendation-depths after if and else:

numNeighbourDogs = 10 
numMyDogs = 2
if (numMyDogs > numNeighbourDogs):
  print("Muahaha I win!") #two white-spaces indentation
else:
     print("Sh*t, I need to get more dogs!") #five white-spaces indentation

As you see, the code still runs without errors. Now, let's try to put if and else themselves onto different indentation levels.

numNeighbourDogs = 10 
numMyDogs = 2
if (numMyDogs > numNeighbourDogs):
    print("Muahaha I win!")
 else: # 1 white-space indendation
    print("Sh*t, I need to get more dogs!")

This time we get an IndentationError.

Standard Python has several different data structures. The most important ones are: lists, tubles and dictionaries. A list is basically an array. A list can itself contain more lists. A list can contain different types of data types, i.e., strings and integers. Lists are defined and accessed using square brackets [ ]. Let's create a new list to show these concepts:

 myList = [1, 2, ["this", "is also", "a list"], "three"]

You can access all list elements normally using square brackets. If you want to access an element of the list at index 2, you just use another pair of square brackets:

 myList[2][0] 

We can also assign new values to list element. Lists also support functions such as append, count, sort, etc. Use the dir() command to get a list of member functions of the List object. This works for any object in Python. Sometimes this is faster than Googling or looking through the documentation.

 dir(myList) 

The next datatype are tuples. Tuples are basically immutable one-dimensional arrays.

myTuple = (1,2,3) 
myTuple[0]
myTuple[0] = 2 # Results in TypeError because a tuple is immutable

Dictionaries are associative array, i.e., hash tables with key-value pairs. Again, data types can be mixed.

dogAges = {"Fluffy": 12, "Lessie": 5, "Pluto": 10} 
dogAges["Fluffy"] # Use Key to access Value 

We have already seen an if statement. Now we will see how easy it is to iterate through lists using a for loop.

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9]
for number in myList:
    print("The current number is: ", number)

In Python we can of course also define functions. We are now going to write a small function that takes two values and prints the bigger one. If they are the same, we print both.

 def compare_and_print(a, b):
    if (a>b):
        print(a)
    elif (a==b):
        print(a, " ", b)
    else:
        print(b)

Try calling the function to make sure it is doing what it should.

Like in Java and C++, we can also define classes and instantiate them as objects. Let's stick with the dog theme and create a class Dog. We overwrite the init function (basically the constructor) such that we can pass arguments to constructor and assign member variables at creation time. self refers to the class itself and is used to access member variables inside the class itself.

class Dog:
    def __init__(self, name, breed, age):
        self.name=name
        self.breed=breed
        self.age=age
        
    def bark(self):
        if (self.name=="Snoop"):
            print("I ain't no dawg, dawg!")
        else:
            print("WUFF WUFF!")
        
    def bite(self):
        print("*BITE*")

We can now instantiate a new Dog and call its methods and get its member variables, as well as assign new values. For example, we can change the name of our dog.

doggy = Dog("Snoop", "Rapper", 45)
doggy.name
doggy.bark()
doggy.bite()
doggy.name="Pluto"

So far we have only used Python's built-in functionalities. However, most if its power comes from external libraries/packages. In order to include such libraries wie use the import statement. Let's use the NumPy package for some simple scientific computations. With NumPy we can do many of the things that Matlab can do.

import numpy as np 

In NumPy, the main object is the homogeneous multidimensional array. In contrast to Python lists, all the elements in a NumPy array have to be of the same type. The most common way to create NumPy arrays is to use the numpy.array() function and pass a Python list to it.

myArray = np.array([4,3,2])
myArray[0]

Let's look at some of the functions that can be called on NumPy arrays:

dir(myArray)
myArray.max()
myArray.min()
myArray.mean()
myArray.sort()
myArray.sum()
myArray.transpose()
# ... and many more 

Instead of calling these methods on the Numpy array object, we can also use the Numpy functions and pass the array as an argument. For example: np.max(myArray).

We can also instantiate arrays of certain lengths and automatically fill them with zeros, ones or random numbers:

a = np.zeros(10)
b = np.ones(10)
c = np.random.rand(10)

To instantiate matrices, i.e., multidimensional arrays, we just specify the number of elements for each dimension:

 m = np.random.rand(10,10)

Again, we can perform all kinds of operations on this matrix. For example, we can compute its eigenvalues and eigenvectors:

eigvals, eigvecs = np.linalg.eig(m)

The function returns an array with all eigenvalues (we assigned it to the variable eigvals) and an array of eigenvectors, which we assigned to eigvecs.

Now, let's plot the eigenvalues in the complex plane. First, we need to get the real and imaginary values of each eigenvalue

eigvals_real = eigvals.real
eigvals_imag = eigvals.imag

Now we need a plotting library. The most popular one is called Matplotlib, and works very similar to Matlabs plotting functionality (hence the name).

import matplotlib.pyplot as plt

Now, we use the scatter command to create a 2D plot. The first argument corresponds to the x-axis, and the second argument to the y-axis.

plt.scatter(eigvals_real, eigvals_imag)
plt.show()

Finally, let's make our plot look a little nicer. We want the origin to be in the middle and the maximum value of the axes should be a little bigger than the largest eigenvalue. We will also add axes labels and a title. We will also add a grid.

fig = plt.scatter(eigvals_real, eigvals_imag)
x_abs_lim=np.max(np.abs(eigvals_real))+1
y_abs_lim=np.max(np.abs(eigvals_imag))+1
plt.xlim(-x_abs_lim, x_abs_lim)
plt.ylim(-y_abs_lim, y_abs_lim)
plt.xlabel("Real")
plt.ylabel("Imag")
plt.title("Eigenvalues of a Random Matrix")
plt.grid()
plt.show()

This was only a short overview of Pythons capabilities. There are many good tutorials online for those who want to dive deeper. However, learning by doing is probably the best way to learn Python. Also, do not be afraid to Google. Most often, this is by far the easiest and fastest way to get something done, especially in a programming language you are not yet familiar with.

SQL Alchemy

For the rest of this lab you will be working with the files in the local-data folder in your home directory. Open a terminal and navigate to that directory. Inside you find following files:

  1. lab.py : Python file where you need to add your own code
  2. ti2db.sqlite : A database of movies containing several tables.
  3. romances.csv : A list of the best romance movies of all time.

In addition to those files, we suggest you to open an interactive Python shell. It is often helpful for debugging, especially when one is unsure about the correct syntax. Furthermore, you will need to use Google to complete the tasks.

Open lab.py in a text editor and start filling in the missing code. For simplicity's sake, all the code is in one single file. Thus, the entire script will be executed every time you run it. You can comment out parts of the code you have already finished, e.g., some SQL queries, if you do not want to execute them every time. All the incomplete code is currently commented out, such that the script can be run without errors.

Before you start, create a copy of ti2db.sqlite, just in case it gets messed up.

cp ti2db.sqlite ti2db_backup.sqlite 

You can have a look at the ti2db.sqlite database in the terminal:

sqlite3 ti2db.sqlite

This opens a connection to the sqlite database and you can now enter standard SQLite commands.

Most answers to the exercises can be found at http://docs.sqlalchemy.org/en/latest/orm/tutorial.html.

labs/sqlalchemy.1495106816.txt.gz · Last modified: 2020/08/31 21:03 (external edit)