Chapter 05

05. High Performance Python

Goals

This chapter will cover topics related to native Python and some external modules in the context of high performance. We will have code that can run on a single CPU, multiple coroutines, multiple CPUs, and multiple computers. We will also use the profiling output to guide optimizations. The main references are from High-Performance Python, 2nd Edition1.

Introduction

High-performance programming involves minimizing the time it takes to move and manipulate data in a computer. This can be achieved by reducing overhead (writing more efficient code) or finding better algorithms. The idea is to reduce overhead to understand the underlying hardware better.

Python’s abstractions hide the direct interactions with hardware, making it seem like a futile exercise to optimize hardware efficiency. However, understanding the hardware’s limitations and how Python’s abstractions handle data movement can lead to writing high-performance Python programs.

The Fundamental Computer System

Computers are composed of three basic components: computing units, memory units, and connections. Each component has specific properties: computing units have processing speed, memory units have capacity and read/write speed, and connections have data transfer speed.

A standard workstation can be described at different levels of detail. At a basic level, it consists of a CPU, RAM, hard drive, and a bus. At a more detailed level, the CPU itself has multiple cache levels (L1, L2, etc.) with smaller capacities but faster speeds. Modern architectures often introduce new configurations (e.g., Intel’s SkyLake CPUs). Both levels typically neglect the network connection, which is a slower connection to multiple other devices.

To better understand these intricacies, the following paragraphs will provide a brief description of the fundamental building blocks of a computer.

References

results matching ""

    No results matching ""