hello world!
Statistical Programming
Prof. Dr. Martin Spindler
Martin Spindler:
Dr. Jan Rabenseifner:
Jan Teichert-Kluge:
Valerian Fourel:
We really appreciate active participation and interaction!
You will be given several statistical programming exercises you have to solve with Python
You can group up (4 students) and work together. Each student group submits one solution. Please, provide your name on your solution
Your solution:
.pdf) summarizing your solution to the statistical problemI’d encourage you to start and submit your solution early and to use Quarto
After completing the course, you will be able to read code and write your own program using Python
In addition to a standard programming course, you’ll learn how to use Python for statistical problems
You can see this as a rough introduction to the basics of data science
The term statistical programming refers to the process of writing code in a programming language in order to perform a statistical analysis. There are various softwares available that are different, e.g., in terms of programming effort, efficiency and implemented methods. The most widely used softwares to perform statistical analysis by machine learning methods are R and Python.
Statistical programming combines two elements:

source: c-sharpcorner.com
In case you get frustrated, read this nice little blog post about this (medium.com).
In case you find errors and typos in the lecture notes, please report them in the form on the course website.
Option 1: Anaconda (Traditional)
Option 2: UV (Modern & Fast)
Editor: Positron
Why UV?
Installation
Traditional Workflow (pip/conda)
# Create virtual environment
python -m venv myenv
source myenv/bin/activate # or myenv\Scripts\activate on Windows
# Install packages
pip install numpy pandas matplotlib
# Wait... wait... wait...
# Manage dependencies
pip freeze > requirements.txt
# Reproduce environment
pip install -r requirements.txtModern Workflow (UV)
# Everything in one command
uv add numpy pandas matplotlib
# Done in seconds!
# Dependencies auto-managed
# pyproject.toml + uv.lock created
# Reproduce environment
uv sync
# Fast and deterministicKey Advantages
Project created with UV
my_project/
├── pyproject.toml # Project metadata & dependencies
├── uv.lock # Locked versions (like package-lock.json)
├── .python-version # Python version for project
├── src/
│ └── my_project/
│ └── __init__.py
└── README.md
pyproject.toml (Modern standard)
Key Benefits
Why Use AI Coding Assistants?
Example Use Cases
# Type a comment, AI suggests implementation
# Function to calculate mean and std of a list
# AI suggests:
def calculate_stats(data):
mean = sum(data) / len(data)
variance = sum((x - mean)**2 for x in data) / len(data)
std = variance ** 0.5
return mean, stdPopular Tools
GitHub Copilot
Tips
Claude Code (CLI Tool)
Setup & Usage
# Install Claude Code CLI
npm install -g @anthropic-ai/claude-code
# Basic usage
claude "add type hints to all functions"
# Interactive mode
claudeWhen to Use
✅ Good Practices
❌ Avoid These Pitfalls
Example 1: Data Analysis Task
# Prompt to AI: "Load CSV file and compute
# descriptive statistics for all numeric columns"
# AI generates:
import pandas as pd
def analyze_csv(filepath):
"""Load CSV and compute statistics."""
df = pd.read_csv(filepath)
numeric_cols = df.select_dtypes(
include=['number']).columns
stats = df[numeric_cols].describe()
return stats
# You learn: pandas data type selection,
# describe() method, clean function structureExample 2: Debugging Help
# Your buggy code:
def calculate_mean(numbers):
return sum(numbers) / len(numbers)
# Fails with empty list!
# Ask AI: "Fix this function to handle edge cases"
# AI suggests:
def calculate_mean(numbers):
"""Calculate mean, handling edge cases."""
if not numbers:
return 0 # or raise ValueError
return sum(numbers) / len(numbers)
# You learn: Input validation,
# error handling patternsExample 3: Writing Tests
Ask AI: “Write pytest tests for calculate_mean”
AI generates tests for normal cases, edge cases, and error handling.
Effective Learning Strategy
Prompt Engineering for Coding
Good prompts get better results:
❌ Bad: "write code for data analysis"
✅ Good: "Write a Python function that:
1. Loads a CSV file using pandas
2. Handles missing values by dropping rows
3. Calculates mean and median for numeric columns
4. Returns results as a dictionary
5. Includes type hints and docstring"Iterative refinement
| Tool | Best For | Free? | Integration |
|---|---|---|---|
| GitHub Copilot | Real-time suggestions | Students/Edu | VS Code, JetBrains |
| Claude Code | Autonomous tasks | Free tier | CLI, VS Code |
| ChatGPT | Learning & explanations | Free tier | Web, API, Codex in VS Code |
| Cursor | AI-first editing | Free tier | Standalone IDE |
python and then run following codeHowever, in case you want to write more complex code, it’s worth to organize your code in scripts (.py files)
IDE’s provide extra functionalities that help you write and organize your code (and software project)

Quarto is a new publication tool that combines the advantages of notebooks and IDE’s
You can use it now only to generate (Jupyter) Notebooks, but also to generate books, websites and slideshows
The more familiar you are with Python, you will find it more easily to also use other tools like Jupyter and Quarto
We will focus on using VS Code in this course. If you are interested, we can demonstrate Jupyter and Quarto later in the course
We recommend you to use Quarto for your assignment solution
It’s time for our first code example
hello_world.py (click on file > new file).py filesInsert the following code in the file
Shift + Enter) or click on the Run button on the top rightIt’s time for our first code example
hello_world.pypython and paste the codeEnterAlternatively you can click on the Interactive Window button on the top right and paste the code there
It’s time for our first code example
cmd or open Anaconda Prompthello_world.py using cdpython hello_world.pyYou just ran your first Python code example!
A program is a sequence of instructions that specifies how to perform a computation (mathematical or symbolic)
Basic instructions in virtually any language
Programming: Process of breaking a large, complex task into smaller and smaller substasks until the subtask is simple enough to be performed with one of these basic instructions (Downey, 2015, P. 2)
#, are ignored by the interpreter, only intended for human readersValue: One of the fundamental things (like letter or number) that a program manipulates
Values are categorized in different classes
4)"hello world!" or 'banana')3.2)= token ( does not mean equal !)def, and, class, …Statement: Instruction that the Python interpreter can execute, for example, assignments, while, if, for, import
Expressions: Combination of values, variables, operators and calls to functions
+-*/**%//Strings comprise characters, i.e., single symbols of a chosen font
Strings are immutable
We can manipulate single characters in a string
[n:m], n is included and m is excluded
[:m], [n:], [:][-2:-1]?[0]!Int
ro to Python
Intro to Pytho
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 my_string[0] = "A" TypeError: 'str' object does not support item assignment
in and not in.find().split()==, > and <in and notfindPeter Wright
+*<), center (^), right(>)Nested tuples are possible
Tuples are immutable (like strings)
in and not in+ and repetition *len() for lengthEmpty lists with []
Nested lists, e.g., [1, 2, [4, 5]]
Remove elements from lists using del
Method: A function attached to an object. Invoking (= activating) a method causes the object to respond in some way, in Python using . notation.
Object here: list
Methods:
.append(element): Add element to end of the list.insert(position, element): Adds element at position and shifts remaining elements up.count(element): Counts how often element appears in a list.extend(newlist): Puts newlist at the end of the listMethod: A function attached to an oject. Invoking (= activating) a method causes the object to respond in some way, in Python using . notation.
Object here: list
Methods:
.index(element): Finds the index of the first time element apears in the list.reverse().sort().remove(element): Removes element at the first position it appears['Mount Denmark', 'K2', 'Kangchenjunga']
.copy() or use slicing [:]isDictionaries are mappings from keys of immutable type to values of any (heterogeneous) type
Use key:value pairs to define dictionaries and add pairs with [] or .update({key:value}).
{'brand': 'Ford', 'model': 'Mustang', 'year': 1964, 'hp': 210}
Access to dictionaries is very fast
Order of pairs does not matter
.sort()!.keys(): creates list of underlying keys.values(): creates list of underlying values.items(): creates list of key:value pairsin and not in test only for keys (!).copy(): Create a copy of a dictionary.update(): Creates new entries in the dictionary or update existing ones. Allows multiple creations or updates.One of the most useful features of programming languages is their ability to take small building blocks and compose them into larger chunks (Wentworth et al., 2017, P. 19)
Bugs: Programming errors
Debugging: Process of tracking down errors
Programming, and especially debugging, sometimes brings out strong emotions. If you are struggling with a difficult bug, you might feel angry, despondent, or embarrassed.
[…]
Preparing for these reactions might help you deal with them. One approach is to think of the computer as an employee with certain strengths, like speed and precision, and particular weaknesses, like lack of empathy and inability to grasp the big picture (Downey, 2015, P. 6)
Bugs: Programming errors
Debugging: Process of tracking down errors
Function: Named sequence of statements that performs a computation
Structure of functions in Python:
def <functionname>:Call function by name
20
y. Good luck!You can set default values for function arguments
Nonereturn statement100
Remember: Local variables exist only inside functions. Parameters are local variables.
Local variables exist only while the function is being executed. We can override this with global.
print() for debugging
input or print in function bodies unless necessary (or for debugging)lambda as keyword to create a lambda functions.2
0
2
0
.apply() for data frames.lowercase_letters_with_underscores (CamelCase is for classes)import and function definitions at the top of a fileMake code easier to read and debug
Make program smaller by avoiding code replications
Well-written functions can be reused
Modules contain a collection of functions
Modules play an important role for Python
Execute code depending on a condition: if
Boolean expressions: An expression that is either true or false
==!=>, <>=, <=and, or, notif statement
Source: Wikipedia
Conditional execution based on an if statement
Condition: Boolean expression after if
Placeholder statement: pass (block of statements must never be empty!)
Statement to exit the loop: break
Iteration: Run a block of statements repeatedly based on for and while
Reassignment: Reassign the value of some variable (use with caution!), e.g.,
range()while statement:
+= and -= operators in PythonDo we have oregano ?
Do we have tomatoes ?
Do we have mozzarella ?
range() functionDo we have mozzarella ?
Do we have tomatoes ?
Do we have oregano ?
range() functionAll previous programs stored data in RAM
In order to make data accessible independently from the code, we need to write it to a storage medium
Locations of certain sets of data is stored in so-called files
We have to open and close files actively for writing or reading
with statement: secures closing of the file
open function with parameters filename and mode
w means opening for writing
output is the file handle (not the same as the file)
We call methods/functions to modify the file via the handle, but changes happen on the file
test_output.txt is created or, if it already exists, replaced with a new one (Caution!)
“A handle is somewhat like a TV remote control” (Wentworth et al. (2015), P. 140)
r (reading mode)File system is organized in terms of directories, which contain files and further directories
So far, we have been using the current directory of the respective Python-file
To access files in different directories, we have to specify the full path:
c:/temp/file.txt:/home/Python/file.txtRecall: Methods for strings and lists
Modules contain definitions and statements for specific parts of programs
Anaconda comes with a lot of extra modules
import includes all definitions and statements from the module called random
rng is the random number generator.
randrange() returns a random integer.
rng is only pseudo-random, i.e., generation based on a deterministic algorithm
seed value: Starting point, ensures repeatability for testing purposes
math module is a collection of common mathematical functionsWe will find out how to create our own module later!
Popular examples for modules (more on this in part 2 of this course)
numpypandasmatplotlibscikit-learnWith UV (Recommended)
# Add a package to your project
uv add <module_name>
# Add multiple packages
uv add numpy pandas matplotlib
# Add for development only
uv add --dev pytest ruffWith Anaconda/pip (Traditional)
# Using conda
conda install <module_name>
# Using pip
pip install <module_name>
# Multiple packages
pip install numpy pandas matplotlibSo far, we have seen the built-in data types available in Python
However, we can also define our own data type, just as we can define our own functions
So far: We used functions in order to process data (= procedural programming)
Object-oriented programming(OOP): Objects contain data and functionality
OOP makes maintenance and modifitcation of (large) projects much easier
Class: User-defined compound data types
Define a class
CamelCase)
__init__ : Initializer method, which is called every time a new instance of the class is created.self: Refers to the newly created objectMatriculation number
-------------
0
0
Student() is called a constructor
Constructor and initialization method lead to an instance: “Create a new object and set it to default values”
12345678
k-nearest neighbor
Passing an object happens by reference (an alias is created)
Functions/methods might return objects
Objects are in some state, which can be updated from time to time, and objects are mutable
Examples:
Runtime errors create exception objects
Python terminates and prints out the traceback, which ends with a message describing the exception that occured
Examples:
Sometimes: Execute an operation that causes an exception, but we don’t want to terminate the program \(\Rightarrow\) try
try has four separate clauses (= parts)
try: As little as possible in this part (otherwise, unexpected exception)elsefinallyimport math
user_input = input("Any floating number: ")
try:
# Could fail (possibly >1 statement)
user_input_float = float(user_input)
except ValueError:
# Executed if "ValueError" is raised.
# Different exceptions can be handled in one try-statement
print("Not a floating point number")
else:
# Executed if no exception was raised.
print("The square root of {0} is {1}.".format(
user_input_float, math.sqrt(user_input_float)))
finally:
# Executed in any case.
print("Done!")Write your own exceptions
For known error conditions, we can raise an exception
So far, we have mostly used base Python
Important modules that facilitate working with Python in practice
NumPy - handling arrays and linear algebrapandas - data framesmatplotlib - visualizationSource: Jake VanderPlas (PyCon, 2017)
NumPy (Numerical Python) is a module for handling arrays
It provides various mathematical operations (linear algebra)
Example with base Python
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[73], line 2 1 number_list = [1, 2, 3, 4, 5] ----> 2 mean(number_list) NameError: name 'mean' is not defined
NumPyPython’s lists can be slow to process - substantial speed improvement possible using arrays
NumPys array object (ndarray) provides many supporting functions
Arrays (and NumPy) are very common in data science
# With UV (recommended)
uv add numpy
# With conda/pip (traditional)
conda install numpy
pip install numpynp)ndarray from lists, tuples or array-like objectsarray[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]
[start:end:step].NumPy provides some additional data types. A character refers to the type of data, for example i to an interger
b - boolean,f - float,S - string.dtype[b'1' b'2' b'3']
|S1
.astype().We can change the shape of an array
Reshape a 1-D array to a 2-D array
-1 in .reshape().Joining \(=\) merging two or more arrays in a single array
In NumPy, we use concatenate to join arrays based on axes
axis argument indicates along which dimension arrays should be joined, axis = 0 along rows, axis = 1 along columns


stack() is basically the same as concatenate(), except that it is done along a new axishstack() and vstack() instead.array_split(ary, indices_or_sections, axis=0)hsplit(), vsplit()where()searchsorted()sort() (returns copy!)NumPy provides ufuncs (Universal Functions) that work with ndarray objects \(\Rightarrow\) Speed up calculations (vectorization)[ 10 0 12 133]
| Operation | Function |
|---|---|
+ |
np.add() |
- |
np.substract() |
* |
np.multiply() |
/ |
np.divide() |
** |
np.power() |
% |
np.mod() |
| \(\ldots\) | \(\ldots\) |
Linear algebra is used in many algorithmic problems
Element-by-element operations
pandas is a Python module that helps to handle data in an easy and intuitive way
It provides data structures that are very useful if you work with real-world data in Python
pandas can be used to handle
pandas is built on top of NumPy for a good integration with scientific computation
pandas is very good in terms of
NumPy)pandas builds on two basic data structures
Series - 1-dimensional data, like a vector / one-dimensional arrayDataFrame - 2-dimensional data, tabular data in rows and columnsDataFrame
Series (which in turn are containers of int, str, \(\ldots\))pandaspandaspandas# With UV (recommended)
uv add pandas
# With conda/pip (traditional)
conda install pandas
pip install pandaspandas, common alias pd (we’ll also load NumPy)Series object0 1.0
1 2.0
2 3.0
3 NaN
4 6.0
5 8.0
dtype: float64
Series objectd, hence index argument has no effect
Series is first build from the dictionary and then reindexed –> NaN as result.Series objectSeries with Series.reindex(). Can you see what happens here?DataFrame objectDataFrame objectDataFrame objectNumPy array| A | B | C | D | |
|---|---|---|---|---|
| 2022-04-01 | 0.227769 | -0.755529 | 1.144946 | -0.352005 |
| 2022-04-02 | -0.482710 | 0.655263 | 0.632421 | -0.622162 |
| 2022-04-03 | -0.143393 | 0.871788 | 0.332175 | 0.591673 |
| 2022-04-04 | -2.419224 | 0.254834 | -1.100392 | -0.307889 |
| 2022-04-05 | 1.465222 | -0.118808 | 0.329490 | 0.774558 |
| 2022-04-06 | -0.079353 | 0.772174 | -0.178963 | 0.195591 |
.head() and .tail(), respectively A B C D
2022-04-01 0.227769 -0.755529 1.144946 -0.352005
2022-04-02 -0.482710 0.655263 0.632421 -0.622162
2022-04-03 -0.143393 0.871788 0.332175 0.591673
A B C D
2022-04-05 1.465222 -0.118808 0.329490 0.774558
2022-04-06 -0.079353 0.772174 -0.178963 0.195591
NumPy array (recommended)[[ 0.22776912 -0.75552917 1.14494597 -0.35200459]
[-0.48271037 0.65526347 0.63242091 -0.62216216]
[-0.1433934 0.87178817 0.33217522 0.5916732 ]
[-2.41922433 0.25483371 -1.10039219 -0.30788933]
[ 1.46522216 -0.1188076 0.32948986 0.77455794]
[-0.07935271 0.77217429 -0.17896296 0.19559059]]
<class 'numpy.ndarray'>
[[ 0.22776912 -0.75552917 1.14494597 -0.35200459]
[-0.48271037 0.65526347 0.63242091 -0.62216216]
[-0.1433934 0.87178817 0.33217522 0.5916732 ]
[-2.41922433 0.25483371 -1.10039219 -0.30788933]
[ 1.46522216 -0.1188076 0.32948986 0.77455794]
[-0.07935271 0.77217429 -0.17896296 0.19559059]]
<class 'numpy.ndarray'>
| A | B | C | D | |
|---|---|---|---|---|
| count | 6.000000 | 6.000000 | 6.000000 | 6.000000 |
| mean | -0.238615 | 0.279954 | 0.193279 | 0.046628 |
| std | 1.262509 | 0.626941 | 0.767921 | 0.562320 |
| min | -2.419224 | -0.755529 | -1.100392 | -0.622162 |
| 25% | -0.397881 | -0.025397 | -0.051850 | -0.340976 |
| 50% | -0.111373 | 0.455049 | 0.330833 | -0.056149 |
| 75% | 0.150989 | 0.742947 | 0.557359 | 0.492653 |
| max | 1.465222 | 0.871788 | 1.144946 | 0.774558 |
Wentworth et al. (2015), Chapter 15
Online documentation of
matplotlib, https://matplotlib.org/seaborn, https://seaborn.pydata.org/bokeh, https://bokeh.org/plotly, https://plotly.com/graphing-libraries/Statistical Programming