Optimize your Python code

Updated on 14th Jun 2025
See changes

Optimize your Python code

As Python is one of the top prerequisites to learning Machine Learning, there is a huge influx of programmers/scientists from a different background.

Writing code in Python is easy. Few lines of code and it’s working! However, Python is slower compared to other languages. There are multiple ways we can optimize to run the loops a little faster and utilize less memory.

Code profiling (both memory and time)
Cython
Object-oriented programming with design patterns
Generators
Context managers
Multiprocessing
Using slots operator
Using Pythonic code
Preloading memory intensive operations
Dead code removal
Modularization
Disabling unnecessary print statements

Code profiling

To optimize a program we need to find the bottlenecks. There are many libraries available to profile a code. Some of the libraries which we used are cProfile, line_profiler, pycallgraph. Other than these we used the %%timeit operator in Jupyter notebook and memory_profiler. Memory profiler is used to keep the RAM usage under limit.

Cython

Cythonizing the python files will make them run faster. [Reference]

Object-oriented programming with design patterns

Though python can be used as a functional programming language. For a production level code design patterns need to be followed for future support and enhancements. [Reference][Reference]

Dead code removal

Dead codes which were written for some operation but are not being used currently increases the debugging and profiling time. These codes need to be removed first.

Generators

Python generators are one of the best things to handle large data set processing with limited memory usage during runtime. [Reference]

Context managers

Context managers allow you to allocate and release resources precisely when you want to. Opening files and closing automatically can be done with the help of these. No need to explicitly close the files.

Multiprocessing

Multiprocessing module in python can be used to run independent processes in parallel. It can be used for utilizing the CPU of the server.

Using slots operator [Reference]

Object attributes of a class are stored in dict.
Dictionaries are used for high access speed, O(1). However, in most cases dict becomes 1/3rd empty.

from sys import getsizeof as gs

d_obj = {}
print(gs(d_obj)) *# print the default size*
*# 280*

d_obj = {k: v for k, v in enumerate(range(5))}
print(gs(d_obj)) *# default size continues till this*
*# 280*

d_obj = {k: v for k, v in enumerate(range(6))}
*# a new set of memory allocation begin where most of the memory are unutilized*
print(gs(d_obj))
*# 1048*

Details of other data types can be found below: [Source]

Bytes  type        empty + scaling notes
24     int         NA
28     long        NA
37     str         + 1 byte per additional character
52     unicode     + 4 bytes per additional character
56     tuple       + 8 bytes per additional item
72     list        + 32 for first, 8 for each additional
232    set         sixth item increases to 744; 22nd, 2280; 86th, 8424
280    dict        sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
64     class inst  has a __dict__ attr, same scaling as dict above
16     __slots__   class with slots has no dict, seems to store in
                   mutable tuple-like structure.
120    func def    doesn't include default args and other attrs
904    class def   has a proxy __dict__ structure for class attrs
104    old class   makes sense, less stuff, has real dict though.

slots can be used to get rid of these free memory spaces. Here you can not add more attributes once you define slots. By this way, you can reduce the memory usage of the object(s).

Using Pythonic code

There are many cases where using Pythonic codes improve the performance. Some examples are:

Using list/generator expressions over for loop [Reference][Reference]
Using short-circuit operator to avoid additional if-else condition check. [Reference]
Using Python libraries than re-inventing the wheel
Using local function variables instead of global variables [Reference][Reference]
Using decorators to fetch the time taken on function level and disabling them in prod run
Converting numeric intensive operations to Numpy
Using enumerate instead of maintaining a separate count for indexes

Pre-loading memory/time intensive operations

There are memory and time intensive operations which can be done during importing the libraries before running the server. In this way, the query processing time decreases.

Modularization

Modularization is always good. It is necessary for a maintainable code, quickly editable for ever-changing business requirements. If a function is called for more than 20 times then you each second saved multiplies that many times saved.

Disabling print statements

A separate function is used to print the operations. However, those can be disabled during the production run in a single flag change. Now the program can run as a service and log files can be used for operation monitoring.

Technology stacks for performance

Nginx for load balance between multiple servers
Redis as the cache manager
RabbitMQ or Apache Kafka for job queue management
CherryPy as WSGI server in Windows OS
uWSGI in Linux OS
Flask for REST API development

Conclusion

After implementing the above methods the concurrency user support has been increased more than 10 times. With implementing all the methods described here we target to increase the concurrency support by at least 100 times.