Appendix A — Python

Python is a popular and beginner-friendly programming language known for its simplicity and readability. It’s widely used in web development, data science, automation, artificial intelligence, and more. Whether you’re analyzing data, building a website, or creating software, Python is a powerful and versatile tool to learn.

A.1 Colab

During this course we are going to use Google Colab which is a hosted Jupyter notebook service that requires no setup to use and provides free access to computing resources, including GPUs and TPUs. That is, Colab runs in your browser and you do not have to install anything on your computer. With a Jupyter notebook you can weave you code together with text.

To be familiar with Colab do the following:

  1. If you do not have a Google account create one. Note if you have a gMail then you already have a Google account.
  2. Open and do this tutorial.

A.2 Laptop setup

If you prefer to have Python installed on your laptop, then do the following:

  1. Go to the official Python website: https://www.python.org/downloads
  2. Click the download button for your operating system (Windows, macOS, or Linux).
  3. Run the installer.
  4. Important: Make sure to check the box that says “Add Python to PATH” during installation.
  5. Follow the prompts to complete the installation.

To verify the installation, open a terminal or command prompt and type:

python --version

To install an IDE for working with your code, you may install Visual Studio Code (VSCode):

  1. Go to the VSCode website: https://code.visualstudio.com

  2. Download and install the version for your operating system.

  3. Open VSCode after installation.

  4. Install the Python extension in VSCode:

    • Go to the Extensions tab (or press Ctrl+Shift+X)
    • Search for “Python” and install the one published by Microsoft

You’re now ready to start coding in Python!

A.3 Learning Python

If you are new to Python you may have a look at some DataCamp Courses. First you HAVE TO SIGNUP using this link. Afterwards have a look at these courses:

  1. Introduction to Python - Learn the basics of Python programming, including variables, data types, and control flow.

  2. Intermediate Python - Build on your basic knowledge with functions, loops, and working with Python libraries.

  3. Data Manipulation with pandas - pandas is the world’s most popular Python library, used for everything from data manipulation to data analysis.

A.4 Coding/naming convention

The main reason for using a consistent set of coding conventions is to standardize the structure and coding style of an application so that you and others can easily read and understand the code. Good coding conventions result in precise, readable, and unambiguous source code that is consistent with other language conventions and as intuitive as possible.

Different ways of naming you variables exists. You are advised to adopt a naming convention. PEP 8 is the official style guide for Python code. It provides conventions for writing readable, consistent, and maintainable Python programs. Adhering to PEP 8 helps teams collaborate more effectively and improves code quality across projects. An overview is given below.

Naming Conventions

Type Convention Example
Variable lower_case_with_underscores total_count
Function lower_case_with_underscores process_data()
Class CapitalizedWords DataProcessor
Constant ALL_CAPS_WITH_UNDERSCORES MAX_SIZE
Method (in class) lower_case_with_underscores compute_value()
Module/Package lowercase or lower_case_with_underscores utils, data_loader
Private/Internal _single_leading_underscore _helper_function()
Strong “private” __double_leading_underscore __internal_method()
Magic method (dunder) __double_leading_and_trailing__ __init__, __str__

Indentation

  • Use 4 spaces per indentation level.
  • Do not use tabs.
  • Use hanging indents for long statements, and align with the opening delimiter.

Line Length

  • Limit lines to 79 characters.
  • For docstrings and comments, limit to 72 characters.

Whitespace

  • Avoid extra spaces:
    • No space inside parentheses/brackets/braces: func(a, b)
    • No space before commas, colons, or semicolons.
    • One space after commas and around binary operators: x = a + b
  • Use blank lines to separate functions and classes.

Comments

  • Use comments to explain why, not just what.
  • Block comments:
    • Use full sentences.
    • Start with a capital letter and end with a period.
  • Inline comments:
    • Use sparingly.
    • Leave two spaces before the #, and one space after.

Docstrings

  • Use triple double quotes: """This is a docstring."""

  • All public modules, functions, classes, and methods should have docstrings.

  • First line should be a short summary.

  • Leave a blank line after the summary if more detail follows.

  • Example:

    def add(a, b):
        """
        Return the sum of two numbers.
    
        Parameters:
        a (int or float): The first number.
        b (int or float): The second number.
    
        Returns:
        int or float: The sum of a and b.
        """
        return a + b

A.5 Debugging

Code development and data analysis always require a bit of trial and error. This Colab notebook briefly cover some options.

A.6 R and Python packages

If you are used to do data transformation in R then this table may be useful.

R Package Purpose Python Equivalent(s) Notes
dplyr Data manipulation (filter, mutate, etc.) pandas, dfply, siuba pandas is standard; dfply and siuba mimic dplyr syntax with pipes and tidy verbs
ggplot2 Data visualization (Grammar of Graphics) plotnine Closest match in syntax and philosophy; uses + for layers like ggplot2
tidyr Data tidying (pivoting, reshaping) pandas, pyjanitor pandas handles pivot, melt; pyjanitor adds more tidy-like helpers
readr Read/write CSV pandas Use read_csv(), to_csv()
jsonlite Read/write JSON json (standard), pandas json for raw files; pandas.read_json() for tabular JSON
readxl / writexl Read/write Excel pandas, openpyxl, xlsxwriter pandas integrates with Excel libraries for reading/writing .xlsx
googlesheets4 Google Sheets I/O gspread, gspread-pandas Python requires Google API setup; gspread-pandas integrates with DataFrames