Python Workout Notes

Notes on Testing with pytest

Showing logs like print and logging in your test

pytest captures stderr by default. To show logs, you need to pass -o log_cli=true like below.
Source: stackoverflow

PYTHONPATH=. pytest -o log_cli=true test/files/test_files.py::test_passwd_to_dict

Using List Comprehensions with assert

Using list comprehensions in assertions reference: https://edricteo.com/list-comprehension-addiction/

import pytest


@pytest.mark.parametrize(
    "inputs, expected",
    [
        (
                ("Chocolate", "Vanilla", "Strawberry"),
                ["Chocolate", "Vanilla", "Strawberry"],
        )
    ],
)
def test_create_scoops_with_different_iterables(inputs, expected):
    assert all([value in expected for value in create_values(inputs)])

Mocks

Mocking opening a file

Example of how to mock a file.

def test_final_line():
    mock_open = mock.mock_open(read_data='a\nab\nabc\nabcd')
    with mock.patch("builtins.open", mock_open) as m:
        result = final_line('file_path')
    assert result == 'abcd'


# or 

def test_multi_columns_multi_rows():
    fake_tsv = StringIO('1\n'
                        '1\t2\n'
                        '1\t2\t3\n'
                        '1\t2\t3\t4')
    with mock.patch("builtins.open", return_value=fake_tsv):
        assert sum_multi_columns('file_path') == 32

You can also use StringIO

def test_passwd_to_dict():
    fake_passwd = StringIO(
        '###############\n'
        '# User Database\n'
        '###############\n'
        '               \n'
        'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false\n'
        'root:*:0:0:System Administrator:/var/root:/bin/sh\n'
        'funnyhaha.org\n'
        'daemon:*:1:1:System Services:/var/root:/usr/bin/false\n')
    with mock.patch("builtins.open", return_value=fake_passwd):
        assert passwd_to_dict('file_path') == {'nobody': '-2', 'root': '0',
                                               'daemon': '1'}

Mock Writing to a File

Reference: https://stackoverflow.com/a/55657594/12207563

import mock
import LogFIle


def test_logfile():
    open_mock = mock.mock_open()
    with mock.patch("builtins.open", open_mock, create=True):
        lf = LogFile('dummy.log')
        lf.write('foobarbaz')
    open_mock.assert_called_with("dummy.log", "w")
    open_mock.return_value.write.assert_called_once_with("foobarbaz")

Lists and Tuples

Dicts and Sets

Files

Functions

Working with inner functions and closures can be quite surprising and confusing at first. That’s particularly true because our instinct is to believe that when a function returns, its local variables and state all go away. Indeed, that’s normally true–but remember that in Python, an object isn’t released and garbage-collected if there’s at least one reference to it. And if the inner function is still referring to the stack frame in which it was defined, then the outer function will stick around as long as the inner function exists.

Comprehensions

List Comprehension vs For loop

On using comprehensions versus for loops: > When you want to transform an iterable into a list, you should use a comprehension. But if you just want to execute something for each element of an iterable, then a traditional for loop is better.

tl;dr - Use comprehension for tranforming values: > taking values in a list, string, dict, or other iterable and producing a new list based on it–are common in programming. You might need to transform filenames into file objects, or words into their lengths, or usernames into user IDs. In all of these cases, a comprehension is the most Pythonic solution.

Consider what your goal is, and whether you’re better served with a comprehension or a for loop; for example

Going from list comprehensions to generator expressions:

Generator expressions looks like a list comprehension, but uses parentheses rather than square brackets. We can use a generator expression in a call to str.join, just as we could put in a list comprehension, saving memory in the process.

# List Comprehension
", ".join([str(num + 1) for num in nums])

# Remove square brackets, becomes generator expression
", ".join(str(num + 1) for num in nums)

map versus comprehensions map pros: map can take multiple iterables in its input and then apply functions that will work with each of them

import operator

letters = 'abcd'
numbers = range(1, 5)

x = map(operator.mul, letters, numbers)
print(' '.join(x))

This can be done with a comprehension, but a bit more complex as we need to use zip to iterate through two iterables.

import operator

letters = 'abcd'
numbers = range(1, 5)

print(' '.join(operator.mul(one_letter, one_number)
               for one_letter, one_number in zip(letters, numbers)))

Modules

What’s the difference between a module and a package?

Importing a package with __init__.py

If you import a package wholesale like:

import mypackage

If the package directory contains __init__.py, importing mypackage effectively means that __init__.py is loaded, and thus executed. You can, inside of that file, import one or more of the modules within the package.

Creating a Distribution Package

A distribution package is a wrapper around a Python package containing information about the author, compatible versions, and licensing, as well as automated tests, dependencies, and installation instructions.

Creating a distribution package means creating a file called setup.py. Here is a tutorial from Python docs on how to create a package.

Objects

Class vs Instance Attributes

A Python class attribute is an attribute of the class, rather than an attribute of an instance of a class. Python doesn’t have constants, but we can simulate them with class attributes.

class MyClass(object):
    class_var = 1  # class attribute

    def __init__(self, i_var):
        self.i_var = i_var  # instance attribute

Note that all instances of the class have access to class_var, and that it can also be accessed as a property of the class itself:


foo = MyClass(2)
bar = MyClass(3)

foo.class_var, foo.i_var
## 1, 2
bar.class_var, bar.i_var
## 1, 3
MyClass.class_var  ## <— This is key
## 1

Reference: Python Class Attributes: An Overly Thorough Guide

Inheritance

Example

class Person():
    def __init__(self, name):
        self.name = name

    def greet(self):
        return f'Hello, {self.name}'


class Employee(Person)
    def __init__(self, name, id_number):
        self.name = name
        self.id_number = id_number

What does self do?

What does __init__ do?

__init__ simply adds new attributes to the object.

Keeping code DRY with super

There’s one weird thing about my implementation of Employee, namely that I set self.name in __init__. If you’re coming from a language like Java, you might be wondering why I have to set it at all, since Person.__init__ already sets it. But that’s just the thing: in Python, __init__really needs to execute for it to set the attribute. If we were to remove the setting of self.name from Employee.__init__, the attribute would never be set. By the ICPO rule, only one method would ever be called, and it would be the one that’s closest to the instance. Since Employee.__init__ is closer to the instance than Person.__init__, the latter is never called.

Solution: super built-in allows us to invoke a method on a parent object without explicitly naming that parent.

class Employee(Person)
    def __init__(self, name, id_number):
        super().__init__(name)
        self.id_number = id_number

Abstract Base Classes

Abstract base classes are classes that are never instantiated on its own, but various subclasses will inherit.

Subclass attribute vs init method

Limits of OOP principles in Python

Whether this is right or wrong, (directly accessing data in other objects) is fairly common in the Python world. Because all data is public (i.e., there’s no private or protected), it’s considered a good and reasonable thing to just scoop the data out of objects. That said, this also means that whoever writes a class has a responsibility to document it, and to keep the API alive–or to document elements that may be deprecated or removed in the future.

(Unlike Python) In many languages, object-oriented programming is forced on you, such that you’re constantly trying to fit your programming into its syntax and structure.


Iterators and Generators

There are at least three different ways to create an iterator:

  1. Add the appropriate methods to a class
  2. Write a generator function
  3. Use a generator expression

The iterator protocol is both common and useful in Python. By now, it’s a bit of a chicken-and-egg situation–is it worth adding the iterator protocol to your objects because so many programs expect objects to support it? Or do programs use the iterator protocol because so many programs support it? The answer might not be clear, but the implications are. If you have a collection of data, or something that can be interpreted as a collection, then it’s worth adding the appropriate methods to your class. And if you’re not creating a new class, you can still take advantage of iterables with generator functions and expressions.

The book covers how to:

Iterator protocol

  1. __iter__ returns an iterator
  2. __next__ must be defined on the iterator
  3. StopIteration exception which the iterator raises to signal the end of the iterations

How a for loop actually works

  1. Verifies object is iterable using the iter built-in. iter invokes the __iter__ method on the target object.
  2. If the object is iterable, the for loop invokes the next built-in on the iterator, which invokes __next__ on the iterator.
  3. If __next__ raises a StopIteration exception, the loop exits.

Common Qs

  1. “Why isn’t there an index?”
  1. “Why do different object behave differently in for loops?”

How to make a class iterable

  1. Define an __iter__ method that takes only self as an arg, and returns self.
  1. Define a __next__ method that takes only self as an arg. It should either return a value, or raise StopIteration when it runs out of values.

Example of a class with its own iterator

class LoudIterator():
    def __init__(self, data):
        print('\tNow in __init__')
        self.data = data
        self.index = 0

    def __iter__(self):
        print('\tNow in __iter__')
        return self

    def __next__(self):
        print('\tNow in __next__')
        if self.index >= len(self.data):
            print(
                f'\tself.index ({self.index}) is too big; exiting')
            raise StopIteration

        value = self.data[self.index]
        self.index += 1
        print('\tGot value {value}, incremented index to {self.index}')
        return value


for one_item in LoudIterator('abc'):
    print(one_item)

# prints
"""
Now in __init__
       Now in __iter__
       Now in __next__
       Got value a, incremented index to 1
a
       Now in __next__
       Got value b, incremented index to 2
b
       Now in __next__
       Got value c, incremented index to 3
c
       Now in __next__
       self.index (3) is too big; exiting
"""

Generator Functions

Generators look like functions, but when executed acts like an iterator.

The example below, when run, doesn’t execute but rather returns a generator object.

def foo():
    yield 1
    yield 2
    yield 3

This can be saved as a variable and put in a for loop. With each iteration, the function executes through the next yield statement, returns the value it got from yield, then waits for the next iteration. When the generator function exits, it automatically raises StopIteration to close the loop.

g = foo()
for i in g:
    print(i)

Iterable vs Iterator

Iterator term chart

Term What is it? Example To learn more
iter A built-in function that returns an object’s iterator iter(‘abcd’) http://mng.bz/jgja
next A built-in function that requests the next object from an iterator next(i) http://mng.bz/WPBg
StopIteration An exception raised to indicate the end of a loop raise StopIteration http://mng.bz/8p0K
enumerate Helps us to number elements of iterables for i, c in enumerate(‘ab’):
print(f’{i}: {c}’)
http://mng.bz/qM1K
Iterables A category of data in Python Iterables can be put in for loops or passed to many functions. http://mng.bz/EdDq
itertools A module with many classes for implementing iterables import itertools http://mng.bz/NK4E
range Returns an iterable sequence of integers # every 3rd integer, from 10
# to (not including) 50
range(10, 50, 3)
http://mng.bz/B2DJ
os.listdir Returns a list of files in a directory os.listdir(‘/etc/’) http://mng.bz/YreB
os.walk Iterates over the files in a directory os.walk(‘/etc/’) http://mng.bz/D2Ky
yield Returns control to the loop temporarily, optionally returning a value yield 5 http://mng.bz/lG9j
os.path.join Returns a string based on the path components os.path.join(‘etc’, ‘passwd’) http://mng.bz/oPPM
time.perf_ counter Returns the number of elapsed seconds (as a float) since the program was started time.perf_counter() http://mng.bz/B21v
zip Takes n iterables as arguments and returns an iterator of tuples of length n # returns [(‘a’, 10),
# (‘b’, 20), (‘c’, 30)]
zip(‘abc’,
[10, 20, 30])
http://mng.bz/Jyzv

Iterator gotcha: __iter__ in multi-class cases

Problem: Below will throw nothing for B, because the same iterator object is being used.

e = MyEnumerate('abc')

print('** A **')
for index, one_item in e:
    print(f'{index}: {one_item}')

print('** B **')
for index, one_item in e:
    print(f'{index}: {one_item}')

Solution: Implement __iter__ on the main class, but its job is to return a new instance of the helper class.

# in MyEnumerate

def __iter__(self):
    return MyEnumerateIterator(self.data)

Then we define MyEnumerateIterator, a new and separate class, whose __init__ looks much like the one we already defined for MyIterator and whose __next__ is taken directly from MyIterator.

Advantages to this design:

  1. We can put our iterable in as many for loops as we want, without having to worry that it’ll lose the iterations somehow
  2. More organized, as we’re keeping iteration logic (ie. __next__) in a separate class.

Stopping Generator Functions

itertools

Python comes with the itertools module, which makes it easy to create many types of iterators.

The chain tool allows you to chain together various types of iterables.

from itertools import chain

print([i for i in chain('abc', [1, 2, 3], {'a': 1, 'b': 2})])
# ['a', 'b', 'c', 1, 2, 3, 'a', 'b']