Optimize your Python code
How to optimize your Python code for better performance. As Python is one of the top prerequisites to learning Machine Learning, there is a huge influx of programmers/scientists from a different background. Writing code in Python is easy. Few lines of code and it’s working! However, Python is slower compared to other languages. There are multiple ways we can optimize to run the loops a little faster and utilize less memory. Code profiling (both memory and time) Cython Object-oriented programming with design patterns Generators Context managers Multiprocessing Using slots operator Using Pythonic code Preloading memory intensive operations Dead code removal Modularization Disabling unnecessary print statements To optimize a program we need to find the bottlenecks. There are many libraries available to profile a code. Some of the libraries which we used are cProfile, line_profiler, pycallgraph. Other than these we used the %%timeit operator in Jupyter notebook and memory_profiler. Memory profiler is used to keep the RAM usage under limit. Cythonizing the python files will make them run faster. [Reference] Though python can be used as a functional programming language. For a production level code design patterns need to be followed for future support and enhancements. [Reference][Reference] Dead codes which were written for some operation but are not being used currently increases the debugging and profiling time. These codes need to be removed first. Python generators are one of the best things to handle large data set processing with limited memory usage during runtime. [Reference] Context managers allow you to allocate and release resources precisely when you want to. Opening files and closing automatically can be done with the help of these. No need to explicitly close the files. Multiprocessing module in python can be used to run independent processes in parallel. It can be used for utilizing the CPU of the server. Object attributes of a class are stored in dict. Dictionaries are used for high access speed, O(1). However, in most cases dict becomes 1/3rd empty. Details of other data types can be found below: [Source] There are many cases where using Pythonic codes improve the performance. Some examples are: Using list/generator expressions over for loop [Reference][Reference] Using short-circuit operator to avoid additional if-else condition check. [Reference] Using Python libraries than re-inventing the wheel Using local function variables instead of global variables [Reference][Reference] Using decorators to fetch the time taken on function level and disabling them in prod run Converting numeric intensive operations to Numpy Using enumerate instead of maintaining a separate count for indexes There are memory and time intensive operations which can be done during importing the libraries before running the server. In this way, the query processing time decreases. Modularization is always good. It is necessary for a maintainable code, quickly editable for ever-changing business requirements. If a function is called for more than 20 times then you each second saved multiplies that many times saved. A separate function is used to print the operations. However, those can be disabled during the production run in a single flag change. Now the program can run as a service and log files can be used for operation monitoring. Nginx for load balance between multiple servers Redis as the cache manager RabbitMQ or Apache Kafka for job queue management CherryPy as WSGI server in Windows OS uWSGI in Linux OS Flask for REST API development After implementing the above methods the concurrency user support has been increased more than 10 times. With implementing all the methods described here we target to increase the concurrency support by at least 100 times. Thank you for your time. Optimize your Python code
Code profiling
Cython
Object-oriented programming with design patterns
Dead code removal
Generators
Context managers
Multiprocessing
Using slots operator [Reference]
from sys import getsizeof as gs
d_obj = {}
print(gs(d_obj)) *# print the default size*
*# 280*
d_obj = {k: v for k, v in enumerate(range(5))}
print(gs(d_obj)) *# default size continues till this*
*# 280*
d_obj = {k: v for k, v in enumerate(range(6))}
*# a new set of memory allocation begin where most of the memory are unutilized*
print(gs(d_obj))
*# 1048*
Bytes type empty + scaling notes
24 int NA
28 long NA
37 str + 1 byte per additional character
52 unicode + 4 bytes per additional character
56 tuple + 8 bytes per additional item
72 list + 32 for first, 8 for each additional
232 set sixth item increases to 744; 22nd, 2280; 86th, 8424
280 dict sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
64 class inst has a __dict__ attr, same scaling as dict above
16 __slots__ class with slots has no dict, seems to store in
mutable tuple-like structure.
120 func def doesn't include default args and other attrs
904 class def has a proxy __dict__ structure for class attrs
104 old class makes sense, less stuff, has real dict though.
Using Pythonic code
Pre-loading memory/time intensive operations
Modularization
Disabling print statements
Technology stacks for performance
Conclusion
Additional optimization tips and references:
Related Articles