Developing Fundamental AI Programming Skills for Drug Discovery¶
This guide contains the content for Jupyter notebooks and portal access at TACC and Python fundamentals and foundations for AI. The objectives of these sections are to introduce the use of Jupyter notebooks at TACC, and to introduce the most fundamental Python skills that are necessary for domain scientists to be successful in AI coding projects.
Onboarding to TACC¶
The Texas Advanced Computing Center (TACC) at UT Austin designs and operates some of the world’s most powerful computing resources. The center’s mission is to enable discoveries that advance science and society through the application of advanced computing technologies.
We will be using cloud resources at TACC as our development environment. We will access the cloud resources via our SSH clients and TACC account credentials, as well as through a visualization portal.
Attention
Everyone please apply for a TACC account now using this link. If you already have a TACC account, you can just use that. Send your TACC username to wallen [at] tacc [dot] utexas [dot] edu as soon as possible.
About TACC¶
TACC is a Research Center, part of UT Austin, and located at the JJ Pickle Research Campus.

A short 7.7 mile walk from main campus!¶

One of two TACC buildings located at JJ Pickle.¶

A tall guy standing among taller Frontera racks.¶
TACC at a Glance
Other TACC Services
Portals and gateways
Web service APIs
Rich software stacks
Consulting
Curation and analysis
Code optimization
Training and outreach
=> Learn more

Snapshot of a few of TACC’s portal projects.¶
TACC Partnerships
NSF: Leadership Class Computing Facility (LCCF)
NSF: Extreme Science and Engineering Discovery Environment (XSEDE)
UT Research Cyberinfrastructure (UTRC)
TX Lonestar Education and Research Network (LEARN)
Industry, STAR Program
International, The International Collaboratory for Emerging Technologies
=> Learn more
Attention
Did you already e-mail me your TACC username?
Before We Continue¶
Using your SSH client, please try to log in to Longhorn. Make sure to use your own
username in place of username
:
[local]$ ssh username@longhorn.tacc.utexas.edu
To access the system:
1) If not using ssh-keys, please enter your TACC password at the password prompt
2) At the TACC Token prompt, enter your 6-digit code followed by <return>.
Password:
TACC Token Code:
Last login: Mon Dec 6 15:36:46 2021 from 192.168.64.11
------------------------------------------------------------------------------
Welcome to the Longhorn Supercomputer
Texas Advanced Computing Center, The University of Texas at Austin
------------------------------------------------------------------------------
** Unauthorized use/access is prohibited. **
If you log on to this computer system, you acknowledge your awareness
of and concurrence with the UT Austin Acceptable Use Policy. The
University will prosecute violators to the full extent of the law.
TACC Usage Policies:
http://www.tacc.utexas.edu/user-services/usage-policies/
______________________________________________________________________________
Welcome to Longhorn, *please* read these important system notes:
--> Longhorn user documentation is available at:
https://portal.tacc.utexas.edu/user-guides/longhorn
|\ /|
| \___________________________.-----.___________________________/ |
'.__________________________ __________________________.'
_ <_/| |\_>
| | ___ __ _ ___ | | _ _ ___ ____ __ _
| | / _ \| \| |/ __) | | | |_| |/ _ \| .. \| \| |
| |___| (_) | \ \ | (_-- ( ) | _ | (_) | '' /| \ \ |
|_____|\___/|_|\__|\___/ '---' |_| |_|\___/|_|\_\|_|\__|
---------------------- Project balances for user username ---------------------
| Name Avail SUs Expires | Name Avail SUs Expires |
| ASC21018 12000 2022-06-30 | SD2E-Communit 5548 2021-12-31 |
| TACC-SCI 221130 2025-06-30 | TRA21002 2040 2021-12-31 |
------------------------- Disk quotas for user username -----------------------
| Disk Usage (GB) Limit %Used File Usage Limit %Used |
| /work 436.5 1024.0 42.63 1445104 3000000 48.17 |
| /home 28.1 40.0 70.26 192235 512000 37.55 |
| /scratch 571.3 0.0 0.00 10273017 0 0.00 |
-------------------------------------------------------------------------------
login2.longhorn(1000)$ # success!
Also, please try to log in to this Vis Portal using your TACC username and password:
https://vis02.tacc.utexas.edu/

Successful login.¶
Jupyter Notebooks¶
What are Jupyter Notebooks?¶
Jupyter Notebooks are a web-based, interactive computing tool for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. They support interactive data science and scientific computing across all programming languages, including Python.
How do Jupyter Notebooks Work?¶
An open notebook has exactly one interactive session connected to a kernel which will execute code sent by the user and communicate back results. This kernel remains active if the web browser window is closed, and reopening the same notebook from the dashboard will reconnect the web application to the same kernel. The kernel’s state, including imported libraries and declared variables, persists between cells.
How do You Access Jupyter Notebooks?¶
The Jupyter website allows you to download Jupyter Notebook and/or try it in your browser.
The TACC Analysis Portal allows TACC users to run Jupyter through your browser on Frontera, Longhorn, Maverick2, or Stampede2, so long as you have an allocation on that system.
The TACC DesignSafe Portal allows anyone with a TACC account to use Jupyter. (Log in, choose Workspace => Tools and Applications => Jupyter)
TACC provides small, short lived Jupyter notebooks to anyone with a TACC account via the TACC Cloud.
Public Jupyter notebooks are available through sites like Google Colaboratory.
Service exist to share notebooks in a browser including Binder.
Interacting with Jupyter Notebooks¶
First, choose one of the methods above and start a new Jupyter Notebook. I will be following method #2 - the TACC Analysis Portal. For this workshop, use the following settings:
The original, and more common “tree” interface to Jupyter will display a file browser (in this case to files in your Longhorn /home directory):
New Python3 kernels and Terminals can be launched by clicking the “New” button on the right side:
Running kernels can be accessed through the “Running” tab on the top. By default,
the Jupyter Notebooks will be saved in your /home directory with a .ipynb
extension. (Re-opening the notebook will open all the cells, and restore the state
of the kernel - more on this later):
The terminal is a standard, fully-functional terminal. Very useful for debugging
and interacting with files / directories. For example, you can easily wget
a
data set or unzip a file:
The Jupyter notebook consists of a toolbar (top) and an unlimited number of cells (bottom):
Cells can be either “code cells” or “markdown cells”. Code cells allow you to enter and run code. Markdown cells use the markdown markup language to comment on or narrate what is happening in the notebook:
Miscellaneous Tips and Tricks¶
Shortcuts:
Shift+Enter: run cell
Ctrl+Enter: run cell in place
Alt+Enter: run cell, insert below
Esc / Enter: toggle between command and edit mode
Run a command within a Jupyter notebook (prefix with !
):
In[]: ! pip list
In[]: ! pip install --user names
The file browser is not (by default) aware of your /work or /scratch directories on Longhorn. The easiest thing to do is open up a terminal and make symbolic links to your /work and /scratch directories in your /home directory
[longhorn]$ cd
[longhorn]$ pwd
/home/012345/username
[longhorn]$ ln -s $WORK WORK
[longhorn]$ ln -s $SCRATCH SCRATCH
Exercise¶
Make a copy of a previously-prepared demo notebook in your /home directory. Open the notebook with Jupyter. Execute the cells within and think about why it is organized the way it is. To copy the notebook:
[longhorn]$ cd # cd to /home directory
[longhorn]$ cp /scratch/03439/wallen/AI-Drug-Discovery/notebook_demo.ipynb .
Once you run each cell, save the notebook with a new name. Re-open the original notebook as well as the new notebook. Does saving the notebook save the state of the kernel as well?
Linux Refresher¶
After working through this material, attendees should be able to:
Describe basic functions of essential Linux commands
Use Linux commands to navigate a file system and manipulate files
Transfer data to / from a remote Linux file system
Edit files directly on a Linux system using a command line utility (e.g. vim, nano, emacs)
Topics covered in this module include:
Creating and navigating folders (
pwd
,ls
,mkdir
,cd
,rmdir
)Creating and manipulating files (
touch
,rm
,mv
,cp
)Looking at the contents of files (
cat
,more
,less
,head
,tail
,grep
)Network and file transfers (
hostname
,whoami
,logout
,ssh
,scp
,rsync
)Text editing with vim (insert mode, normal mode, navigating, saving, quitting)
Log in to Longhorn¶
To log in to the Longhorn cluster, follow the instructions for your operating system or ssh client below.
Mac / Linux
Open the application 'Terminal'
ssh username@longhorn.tacc.utexas.edu
(enter password)
(enter 6-digit token)
Windows
Open the application 'PuTTY'
enter Host Name: longhorn.tacc.utexas.edu
(click 'Open')
(enter username)
(enter password)
(enter 6-digit token)
If you can’t access Longhorn yet, a local or web-based Linux environment will work for this guide. However, you will need to access Longhorn for future materials.
Try this Linux environment in a browser.
Creating and Manipulating Files¶
We have seen how to navigate around the filesystem and perform operations with
folders. But, what about files? Just like on Windows or Mac, we can easily
create new files, copy files, rename files, and move files to different
locations. First, we will navigate to the home directory and create a few new
folders and files with the mkdir
and touch
commands:
$ cd # cd on an empty line will automatically take you back to the home directory
$ pwd
/home/03439/wallen
$ mkdir folder1
$ mkdir folder2
$ mkdir folder3
$ touch file_a
$ touch file_b
$ touch file_c
$ ls
file_a file_b file_c folder1 folder2 folder3
These files we have created are all empty. Removing a file is done with the
rm
(remove) command. Please note that on Linux file systems, there is no
“Recycle Bin”. Any file or folder removed is gone forever and often
un-recoverable:
$ touch junkfile
$ rm junkfile
Moving files with the mv
command and copying files with the cp
command
works similarly to how you would expect on a Windows or Mac machine. The context
around the move or copy operation determines what the result will be. For
example, we could move and/or copy files into folders:
$ mv file_a folder1/
$ mv file_b folder2/
$ cp file_c folder3/
Before listing the results with ls
, try to guess what the result will be.
$ ls
file_c folder1 folder2 folder3
$ ls folder1
file_a
$ ls folder2
file_b
$ ls folder3
file_c
Two files have been moved into folders, and file_c
has been copied - so
there is still a copy of file_c
in the home directory. Move and copy
commands can also be used to change the name of a file:
$ cp file_c file_c_copy
$ mv file_c file_c_new_name
By now, you may have found that Linux is very unforgiving with typos. Generous
use of the <Tab>
key to auto-complete file and folder names, as well as the
<UpArrow>
to cycle back through command history, will greatly improve the
experience. As a general rule, try not to use spaces or strange characters in
files or folder names. Stick to:
A-Z # capital letters
a-z # lowercase letters
0-9 # digits
- # hyphen
_ # underscore
. # period
Before we move on, let’s clean up once again by removing the files and folders we have created. Do you remember the command for removing non-empty folders?
$ rm -r folder1
$ rm -r folder2
$ rm -r folder3
How do we remove file_c_copy
and file_c_new_name
?
$ rm file_c_copy
$ rm file_c_new_name
Looking at the Contents of Files¶
Everything we have seen so far has been with empty files and folders. We will
now start looking at some real data. Navigate to your home directory, then issue
the following cp
command to copy a public file on the server to your local
space:
$ cd ~ # the tilde ~ is also a shortcut referring to your home directory
$ pwd
/home/03439/wallen
$ cp /usr/share/dict/words .
$ ls
words
Try to use <Tab>
to autocomplete the name of the file. Also, please notice
the single dot .
at the end of the copy command, which indicates that you
want to cp the file to .
, this present location (your home directory).
This words
file is a standard file that can be found on most Linux operating
systems. It contains 479,828 words, each word on its own line. To see the
contents of a file, use the cat
command to print it to screen:
$ cat words
1080
10-point
10th
11-point
12-point
16-point
18-point
1st
2
20-point
This is a long file! Printing everything to screen is much too fast and not very
useful. We can use a few other commands to look at the contents of the file with
more
control:
$ more words
Press the <Enter>
key to scroll through line-by-line, or the <Space>
key
to scroll through page-by-page. Press q
to quit the view, or <Ctrl+c>
to
force a quit if things freeze up. A %
indicator at the bottom of the screen
shows your progress through the file. This is still a little bit messy and fills
up the screen. The less
command has the same effect, but is a little bit
cleaner:
$ less words
Scrolling through the data is the same, but now we can also search the data.
Press the /
forward slash key, and type a word that you would like to search
for. The screen will jump down to the first match of that word. The n
key
will cycle through other matches, if they exist.
Finally, you can view just the beginning or the end of a file with the head
and tail
commands. For example:
$ head words
$ tail words
The >
and >>
shortcuts in Linux indicate that you would like to redirect
the output of one of the commands above. Instead of printing to screen, the
output can be redirected into a file:
$ cat words > words_new.txt
$ head words > first_10_lines.txt
A single greater than sign >
will redirect and overwrite any contents in
the target file. A double greater than sign >>
will redirect and append
any output to the end of the target file.
One final useful way to look at the contents of files is with the grep
command. grep
searches a file for a specific pattern, and returns all lines
that match the pattern. For example:
$ grep "banana" words
banana
bananaquit
bananas
cassabanana
Although it is not always necessary, it is safe to put the search term in quotes.
Network and File Transfers¶
In order to login or transfer files to a remote Linux file system, you must know the hostname (unique network identifier) and the username. If you are already on a Linux file system, those are easy to determine using the following commands:
$ whoami
wallen
$ hostname -f
login1.longhorn.tacc.utexas.edu
Given that information, a user would remotely login to this Linux machine using ssh in a Terminal:
[local]$ ssh wallen@longhorn.tacc.utexas.edu
enter password
enter 6-digit token
[longhorn]$
Windows users would typically use the program PuTTY (or another SSH client)
to perform this operation. Logging out of a remote system is done using the
logout
command, or the shortcut <Ctrl+d>
:
[longhorn]$ logout
[local]$
Copying files from your local computer to your home folder on Longhorn would require
the scp
command (Windows users use a client “WinSCP”):
[local]$ scp my_file wallen@longhorn.tacc.utexas.edu:/home/03439/wallen/
enter password
enter 6-digit token
In this command, you specify the name of the file you want to transfer
(my_file
), the username (wallen
), the hostname
(longhorn.tacc.utexas.edu
), and the path you want to put the file
(/home/03439/wallen/
). Take careful notice of the separators including spaces,
the @
symbol, and the :
.
Copy files from Longhorn to your local computer using the following:
[local]$ scp wallen@longhorn.tacc.utexas.edu:/home/03439/wallen/my_file ./
enter password
enter 6-digit token
Instead of files, full directories can be copied using the “recursive” flag
(scp -r ...
). The rsync
tool is an advanced copy tool that is useful for
synching data between two sites. Although we will not go into depth here,
example rsync
usage is as follows:
$ rsync -azv local remote
$ rsync -azv remote local
This is just the basics of copying files. See example scp usage and example rsync usage for more info.
Text Editing with VIM¶
VIM is a text editor used on Linux file systems.
Open a file (or create a new file if it does not exist):
$ vim file_name
There are two “modes” in VIM that we will talk about today. They are called
“insert mode” and “normal mode”. In insert mode, the user is typing text into a
file as seen through the terminal (think about typing text into TextEdit or
Notepad). In normal mode, the user can perform other functions like save, quit,
cut and paste, find and replace, etc. (think about clicking the menu options in
TextEdit or Notepad). The two main keys to remember to toggle between the modes
are i
and Esc
.
Entering VIM insert mode:
> i
Entering VIM normal mode:
> Esc
A summary of the most important keys to know for normal mode are:
# Navigating the file:
arrow keys move up, down, left, right
Ctrl+u page up
Ctrl+d page down
0 move to beginning of line
$ move to end of line
gg move to beginning of file
G move to end of file
:N move to line N
# Saving and quitting:
:q quit editing the file
:q! quit editing the file without saving
:w save the file, continue editing
:wq save and quit
Review of Topics Covered¶
Part 1: Creating and navigating folders
Command |
Effect |
---|---|
|
print working directory |
|
list files and directories |
|
list files in column format |
|
make a new directory |
|
navigate into a directory |
|
remove an empty directory |
|
remove a directory and its contents |
|
refers to the present location |
|
refers to the parent directory |
Part 2: Creating and manipulating files
Command |
Effect |
---|---|
|
create a new file |
|
remove a file |
|
remove a directory and its contents |
|
move a file into a directory |
|
change the name of a file |
|
change the name of a directory |
|
copy a file |
|
copy a directory |
|
autocomplete file or folder names |
|
cycle through command history |
Part 3: Looking at the contents of files
Command |
Effect |
---|---|
|
print file contents to screen |
|
redirect output to new file |
|
scroll through file contents |
|
scroll through file contents |
|
output beginning of file |
|
output end of a file |
|
search for ‘pattern’ in a file |
|
shortcut for home directory |
|
force interrupt |
|
redirect and overwrite |
|
redirect and append |
Part 4: Network and file transfers
Command |
Effect |
---|---|
|
print hostname |
|
print username |
|
remote login |
|
logout |
|
copy a file from local to remote |
|
copy a file from remote to local |
|
sync files between local and remote |
|
sync files between remote and local |
|
logout of host |
Part 5: Text editing with VIM
Command |
Effect |
---|---|
|
open “file.txt” and edit with |
|
toggle to insert mode |
|
toggle to normal mode |
|
navigate the file |
|
quit ending the file |
|
quit editing the file without saving |
|
save the file, continue editing |
|
save and quit |
Additional Resources¶
This is a good summary of the important commands you need to know
Practice VIM on the command line by typing
vimtutor
Python Essentials¶
After working through this material, attendees should be able to:
Write and execute Python code on a remote server
Use variables, lists, and dictionaries in Python
Write conditionals using a variety of comparison operators
Write useful while and for loops
Arrange code into clean, well organized functions
Read input from and write output to a file
Import and use standard and non-standard Python libraries
Topics covered in this module include:
Data types and variables (ints, floats, bools, strings, type(), print())
Arithmetic operations (+, -, *, /, **, %, //)
Lists and dictionaries (creating, interpreting, appending)
Conditionals and control loops (comparison operators, if/elif/else, while, for, break, continue, pass)
Functions (defining, passing arguments, returning values)
File handling (open, with, read(), readline(), strip(), write())
Importing libraries (import, random, names, pip)
Log in to Longhorn¶
To log in to longhorn.tacc.utexas.edu
, follow the instructions for your operating
system or ssh client below.
Mac / Linux
Open the application 'Terminal'
ssh username@longhorn.tacc.utexas.edu
(enter password)
(enter 6-digit token)
Windows
Open the application 'PuTTY'
enter Host Name: longhorn.tacc.utexas.edu
(click 'Open')
(enter username)
(enter password)
(enter 6-digit token)
If you can’t access Longhorn yet, a local or web-based Python 3 environment will work for this guide. However, you will need to access Longhorn for future materials.
Try this Python 3 environment in a browser.
Note
For the first few sections below, we will be using the Python interpreter in interactive mode to try out different things. Later on when we get to more complex code, we will be saving the code in files (scripts) and invoking the interpreter non-interactively.
Data Types and Variables¶
Start up the interactive Python interpreter:
[longhorn]$ python3
Python 3.6.8 (default, Apr 25 2019, 20:47:23)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Tip
To exit the interpreter, type quit()
.
The most common data types in Python are similar to other programming languages. For this workshop, we probably only need to worry about integers, floats, booleans, and strings.
Assign some values to variables by doing the following:
>>> my_int = 5
>>> my_float = 5.0
>>> my_bool = True # or False, notice capital letters
>>> my_string = 'Hello, world!'
In Python, you don’t have to declare type. Python figures out the type
automatically. Check using the type()
function:
>>> type(my_int)
<class 'int'>
>>> type(my_float)
<class 'float'>
>>> type(my_bool)
<class 'bool'>
>>> type(my_string)
<class 'str'>
Print the values of each variable using the print()
function:
>> print(my_int)
5
>> print('my_int')
my_int
(Try printing the others as well). And, notice what happens when we print with
and without single quotes? What is the difference between my_int
and
'my_int'
?
You can convert between types using a few different functions. For example, when you read in data from a file, numbers are often read as strings. Thus, you may want to convert the string to integer or float as appropriate:
>>> str(my_int) # convert int to string
>>> str(my_float) # convert float to string
>>> int(my_string) # convert string to int
>>> float(my_string) # convert string to float
>>>
>>> value = 5
>>> print(value)
5
>>> type(value)
<class 'int'>
>>> new_value = str(value)
>>> print(new_value)
'5'
>>> type(new_value)
<class 'str'>
Arithmetic Operations¶
Next, we will look at some basic arithmetic. You are probably familiar with the standard operations from other languages:
Operator Function Example Result
+ Addition 1+1 2
- Subtraction 9-5 4
* Multiplication 2*2 4
/ Division 8/4 2
** Exponentiation 3**2 9
% Modulus 5%2 1
// Floor division 5//2 2
Try a few things to see how they work:
>>> print(2+2)
>>> print(355/113)
>>> print(10%9)
>>> print(3+5*2)
>>> print('hello' + 'world')
>>> print('some' + 1)
>>> print('number' * 5)
Also, carefully consider how arithmetic options may affect type:
>>> number1 = 5.0/2
>>> type(number1)
<class 'float'>
>>> print(number1)
2.5
>>> number2 = 5/2
>>> type(number2)
<class 'float'>
>>> print(number2)
2.5
>>> print(int(number2))
2
Lists and Dictionaries¶
Lists are a data structure in Python that can contain multiple elements. They are ordered, they can contain duplicate values, and they can be modified. Declare a list with square brackets as follows:
>>> my_shape_list = ['circle', 'triangle', 'square', 'diamond']
>>> type(my_shape_list)
<class 'list'>
>>> print(my_shape_list)
['circle', 'triangle', 'square', 'diamond']
Access individual list elements:
>>> print(my_shape_list[0])
circle
>>> type(my_shape_list[0])
<class 'str'>
>>> print(my_shape_list[2])
square
Create an empty list and add things to it:
>>> my_number_list = []
>>> my_number_list.append(5) # 'append()' is a method of the list class
>>> my_number_list.append(6)
>>> my_number_list.append(2)
>>> my_number_list.append(2**2)
>>> print(my_number_list)
[5, 6, 2, 4]
>>> type(my_number_list)
<class 'list'>
>>> type(my_number_list[1])
<class 'int'>
Lists are not restricted to containing one data type. Combine the lists together to demonstrate:
>>> my_big_list = my_shape_list + my_number_list
>>> print(my_big_list)
['circle', 'triangle', 'square', 'diamond', 5, 6, 2, 4]
Another way to access the contents of lists is by slicing. Slicing supports a
start index, stop index, and step taking the form: mylist[start:stop:step]
.
Only the first colon is required. If you omit the start, stop, or :step, it is
assumed you mean the beginning, end, and a step of 1, respectively. Here are
some examples of slicing:
>>> mylist = ['thing1', 'thing2', 'thing3', 'thing4', 'thing5']
>>> print(mylist[0:2]) # returns the first two things
['thing1', 'thing2']
>>> print(mylist[:2]) # if you omit the start index, it assumes the beginning
['thing1', 'thing2']
>>> print(mylist[-2:]) # returns the last two things (omit the stop index and it assumes the end)
['thing4', 'thing5']
>>> print(mylist[:]) # returns the entire list
['thing1', 'thing2', 'thing3', 'thing4', 'thing5']
>>> print(mylist[::2]) # return every other thing (step = 2)
['thing1', 'thing3', 'thing5']
Note
If you slice from a list, it returns an object of type list. If you access a list element by its index, it returns an object of whatever type that element is. The choice of whether to slice from a list, or iterate over a list by index, will depend on what you want to do with the data.
Dictionaries are another data structure in Python that contain key:value pairs. They are always unordered, they cannot contain duplicate keys, and they can be modified. Create a new dictionary using curly brackets:
>>> my_shape_dict = {
... 'most_favorite': 'square',
... 'least_favorite': 'circle',
... 'pointiest': 'triangle',
... 'roundest': 'circle'
... }
>>> type(my_shape_dict)
<class 'dict'>
>>> print(my_shape_dict)
{'most_favorite': 'square', 'least_favorite': 'circle', 'pointiest': 'triangle', 'roundest': 'circle'}
>>> print(my_shape_dict['most_favorite'])
square
As your preferences change over time, so to can values stored in dictionaries:
>>> my_shape_dict['most_favorite'] = 'rectangle'
>>> print(my_shape_dict['most_favorite'])
rectangle
Add new key:value pairs to the dictionary as follows:
>>> my_shape_dict['funniest'] = 'squircle'
>>> print(my_shape_dict['funniest'])
squircle
Many other methods exist to access, manipulate, interpolate, copy, etc., lists and dictionaries. We will learn more about them out as we encounter them later in this course.
Conditionals and Control Loops¶
Python comparison operators allow you to add conditions into your code in
the form of if
/ elif
/ else
statements. Valid comparison operators
include:
Operator Comparison Example Result
== Equal 1==2 False
!= Not equal 1!=2 True
> Greater than 1>2 False
< Less than 1<2 True
>= Greater than or equal to 1>=2 False
<= Less Than or equal to 1<=2 True
A valid conditional statement might look like:
>>> num1 = 10
>>> num2 = 20
>>>
>>> if (num1 > num2): # notice the colon
... print('num1 is larger') # notice the indent
... elif (num2 > num1):
... print('num2 is larger')
... else:
... print('num1 and num2 are equal')
In addition, conditional statements can be combined with logical operators. Valid logical operators include:
Operator Description Example
and Returns True if both are True a < b and c < d
or Returns True if at least one is True a < b or c < d
not Negate the result not( a < b )
For example, consider the following code:
>>> num1 = 10
>>> num2 = 20
>>>
>>> if (num1 < 100 and num2 < 100):
... print('both are less than 100')
... else:
... print('at least one of them is not less than 100')
While loops also execute according to conditionals. They will continue to execute as long as a condition is True. For example:
>>> i = 0
>>>
>>> while (i < 10):
... print( f'i = {i}' ) # literal string interpolation
... i = i + 1
The break
statement can also be used to escape loops:
>>> i = 0
>>>
>>> while (i < 10):
... print( f'i = {i}' )
... i = i + 1
... if (i==5):
... break
... else:
... continue
For loops in Python are useful when you need to execute the same set of instructions over and over again. They are especially great for iterating over lists:
>>> my_shape_list = ['circle', 'triangle', 'square', 'diamond']
>>>
>>> for shape in my_shape_list:
... print(shape)
>>>
>>> for shape in my_shape_list:
... if (shape == 'circle'):
... pass # do nothing
... else:
... print(shape)
You can also use the range()
function to iterate over a range of numbers:
>>> for x in range(10):
... print(x)
>>>
>>> for x in range(10, 100, 5):
... print(x)
>>>
>>> for a in range(3):
... for b in range(3):
... for c in range(3):
... print( f'{a} + {b} + {c} = {a+b+c}' )
Note
The code is getting a little bit more complicated now. It will be better to stop running in the interpreter’s interactive mode, and start writing our code in Python scripts.
Functions¶
Functions are blocks of codes that are run only when we call them. We can pass data into functions, and have functions return data to us. Functions are absolutely essential to keeping code clean and organized.
On the command line, use a text editor to start writing a Python script:
[longhorn]$ vim function_test.py
Enter the following text into the script:
1 2 3 4 | def hello_world():
print('Hello, world!')
hello_world()
|
After saving and quiting the file, execute the script (Python code is not
compiled - just run the raw script with the python3
executable):
[longhorn]$ python3 function_test.py
Hello, world!
Note
Future examples from this point on will assume familiarity with using the text editor and executing the script. We will just be showing the contents of the script and console output.
More advanced functions can take parameters and return results:
1 2 3 4 5 | def add5(value):
return(value + 5)
final_number = add5(10)
print(final_number)
|
15
Pass multiple parameters to a function:
1 2 3 4 5 | def add5_after_multiplying(value1, value2):
return( (value1 * value2) + 5)
final_number = add5_after_multiplying(10, 2)
print(final_number)
|
25
It is a good idea to put your list operations into a function in case you plan to iterate over multiple lists:
1 2 3 4 5 6 7 8 9 10 | def print_ts(mylist):
for x in mylist:
if (x[0] == 't'): # a string (x) can be interpreted as a list of chars!
print(x)
list1 = ['circle', 'triangle', 'square', 'diamond']
list2 = ['one', 'two', 'three', 'four']
print_ts(list1)
print_ts(list2)
|
triangle
two
three
There are many more ways to call functions, including handing an arbitrary number of arguments, passing keyword / unordered arguments, assigning default values to arguments, and more.
File Handling¶
The open()
function does all of the file handling in Python. It takes two
arguments - the filename and the mode. The possible modes are read (r
),
write (w
), append (a
), or create (x
).
For example, to read a file do the following:
1 2 3 | with open('/usr/share/dict/words', 'r') as f:
for x in range(5):
print(f.readline())
|
1080
10-point
10th
11-point
12-point
Tip
By opening the file with the with
statement above, you get built in
exception handling, and it automatically will close the file handle for you.
It is generally recommended as the best practice for file handling.
You may have noticed in the above that there seems to be an extra space between
each word. What is actually happening is that the file being read has newline
characters on the end of each line (\n
). When read into the Python script,
the original new line is being printed, followed by another newline added by the
print()
function. Stripping the newline character from the original string
is the easiest way to solve this problem:
1 2 3 | with open('/usr/share/dict/words', 'r') as f:
for x in range(5):
print(f.readline().strip('\n'))
|
1080
10-point
10th
11-point
12-point
Read the whole file and store it as a list:
1 2 3 4 5 6 7 | words = []
with open('/usr/share/dict/words', 'r') as f:
words = f.read().splitlines() # careful of memory usage
for x in range(5):
print(words[x])
|
1080
10-point
10th
11-point
12-point
Write output to a new file on the file system; make sure you are attempting to write somwhere where you have permissions to write:
1 2 3 4 5 | my_shapes = ['circle', 'triangle', 'square', 'diamond']
with open('my_shapes.txt', 'w') as f:
for shape in my_shapes:
f.write(shape)
|
(in my_shapes.txt)
circletrianglesquarediamond
You may notice the output file is lacking in newlines this time. Try adding newline characters to your output:
1 2 3 4 5 | my_shapes = ['circle', 'triangle', 'square', 'diamond']
with open('my_shapes.txt', 'w') as f:
for shape in my_shapes:
f.write( f'{shape}\n' )
|
(in my_shapes.txt)
circle
triangle
square
diamond
Now notice that the original line in the output file is gone - it has been
overwritten. Be careful if you are using write (w
) vs. append (a
).
Importing Libraries¶
The Python built-in functions, some of which we have seen above, are useful but limited. Part of what makes Python so powerful is the huge number and variety of libraries that can be imported. For example, if you want to work with random numbers, you have to import the ‘random’ library into your code, which has a method for generating random numbers called ‘random’.
1 2 3 4 | import random
for i in range(5):
print(random.random())
|
0.47115888799541383
0.5202615354150987
0.8892412583071456
0.7467080997595558
0.025668541754695906
More information about using the random
library can be found in the
Python docs
Some libraries that you might want to use are not included in the official
Python distribution - called the Python Standard Library. Libraries written
by the user community can often be found on PyPI.org and
downloaded to your local environment using a tool called pip3
.
For example, if you wanted to download the names library and use it in your Python code, you would do the following:
[longhorn]$ pip3 install --user names
Collecting names
Downloading https://files.pythonhosted.org/packages/44/4e/f9cb7ef2df0250f4ba3334fbdabaa94f9c88097089763d8e85ada8092f84/names-0.3.0.tar.gz (789kB)
100% |████████████████████████████████| 798kB 1.1MB/s
Installing collected packages: names
Running setup.py install for names ... done
Successfully installed names-0.3.0
Notice the library is installed above with the --user
flag. Longhorn
is a shared system and non-privileged users can not download or install packages
in root locations. The --user
flag instructs pip3
to install the library
in your own home directory.
1 2 3 4 | import names
for i in range(5):
print(names.get_full_name())
|
Johnny Campbell
Lawrence Webb
Johnathan Holmes
Mary Wang
Jonathan Henry
Exercises¶
Test your understanding of the materials above by attempting the following exercises.
Create a list of ~10 different integers. Write a function (using modulus and conditionals) to determine if each integer is even or odd. Print to screen each digit followed by the word ‘even’ or ‘odd’ as appropriate.
Using nested for loops and if statements, write a program that iterates over every integer from 3 to 100 (inclusive) and prints out the number only if it is a prime number.
Create three lists containing 10 integers each. The first list should contain all the integers sequentially from 1 to 10 (inclusive). The second list should contain the squares of each element in the first list. The third list should contain the cubes of each element in the first list. Print all three lists side-by-side in three columns. E.g. the first row should contain 1, 1, 1 and the second row should contain 2, 4, 8.
Write a script to read in /usr/share/dict/words and print just the last 10 lines of the file. Write another script to only print words beginning with the letters “pyt”.
NumPy, SciPy, and Matplotlib Primer¶
NumPy, SciPy, and Matplotlib are three complementary and important Python libraries that are useful for exploratory data analysis and machine learning. This primer is a high-level introduction to each library, providing short examples for each.
NumPy¶
NumPy provides a multidimensional array object for very fast operations on numerical data (e.g. arithmetic, logical, sorting, selecting, transforming, statistical, etc.).
Much faster than doing similar operations with normal Python data types because it uses pre-complied, vectorized C code behind the scenes.
1 2 3 4 5 6 7 | import numpy as np
a = np.arange(15).reshape(3, 5)
print(a)
print(a.shape)
print(a.ndim)
|
### Output:
[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]
(3, 5)
2
A few very easy array operation examples:
1 2 3 4 5 6 7 8 | import numpy as np
a = np.arange(15).reshape(3, 5)
print(a.max())
print(a.min())
print(a.sum())
print(a*2)
|
### Output:
14
0
105
[[ 0 2 4 6 8]
[10 12 14 16 18]
[20 22 24 26 28]]
SciPy¶
SciPy provides mathematical algorithms and convenience functions for working with NumPy arrays.
It comes with a host of high level commands for different scientific computing domains, including clustering, Fourier transforms, integration, interpolation, linear algebra, signal processing, among others.
The following short code block defines a function to be integrated (called the
“integrand”) as f(x) = ax^2 + b
. The SciPy quad
function evaluates the
integral over a given range, in this case from 0 to 1. The output is a tuple
where the first item is the value of the integral, and the second item is the
estimated error.
1 2 3 4 5 6 7 8 | from scipy.integrate import quad
def integrand(x, a, b):
return a*x**2 + b
a = 2
b = 1
I = quad(integrand, 0, 1, args=(a,b))
print(I)
|
### Output:
(1.6666666666666667, 1.8503717077085944e-14)
Matplotlib¶
Matplotlib provides utilities for creating static, animated, and interactive visualizations of data.
You can use it to create many different types of plots (line, histogram, scatter, countour, 3D, etc.) with full control over all labels, colors, styles, etc. Look online for examples of the kind of plot you want to make, and undoubtedly there will be some Matplotlib samples available.
1 2 3 4 5 6 | import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 2*np.pi, 50)
plt.plot(x, np.sin(x))
plt.show()
|
1 2 3 4 5 6 | import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 2*np.pi, 50)
plt.plot(x, np.sin(x), 'r-o', x, np.sin(2*x), 'g--')
plt.show()
|
1 2 | # save the image to file instead
plt.savefig('my_sinwave.png')
|
Additional Resources¶
Additional Resources¶
Python for Machine Learning: https://learn.tacc.utexas.edu/mod/page/view.php?id=62
AI for Everyone: https://www.coursera.org/learn/ai-for-everyone/home/welcome