Best Python packages for MVP development
Thomas Dickson
4 minute read
I use Python quite a lot as part of my job as an data engineer/scientist/backend magician. This page describes various packages that I’ve found useful and what I’ve used them for. I’ve divided these packages up into several categories:
- Data wrangling - for working with data.
- Visualisation - for visualising the data, both wrangled and unwrangled.
- MVP development - for developing the first version of products, because that’s what we’re all about after all!
- Service development - for developing the services which support products.
- CI tools - for helping good quality code into production and attempting to keep bad code out.
Data wrangling
- Pandas is the ubiquitous Python data analysis package. Very useful, performant given the ease of implementation and has a welcoming community for first time committers. Good project to learn about managing a large Python codebase.
- sqlalchemy is the primary Python ORM for working with databases. I’d use this to actually interact with a database and leave Pandas for the data analysis work - Pandas doesn’t yet have the
upsert
ability which is useful for storing data. - SQL. SQL is a language you need to know to interact with most databases - probably not the noSQL databases, however. Designing data intensive applications is a good book which has a chapter discussing the differences between noSQL and SQL databases.
Visualisation
I use three different libraries for visualisation:
- Matplotlib is a very customisable library for creating plots in Python. This does mean you have to write a bit of boilerplate code.
- Seaborn is based on matplotlib and is designed to remove the boilerplate code you have to write to plot nice statistical graphs.
- Plotly makes nice interactive graphs which can be embedded in Jupyter Notebooks or apps online.
MVP development
Developing applications to do a thing that shows the interesting data science/modelling work is fun but can be difficult depending on how much customisation I want to be able to have on my webapp. Several packages exist that abstract some of the boring necessary tasks of creating and running a web application.
- I’ve used Dash a lot to develop MVPs of applications that expose quantitative models/data science models to users. I think it’s got a good balance between customisation and off the shelf functionality and there is a solid user base. This package allows dash apps to be deployed using Django that provides an excellent place to start for any new project.
- Streamlit lets you create an app from a single script and is targeted squarely at the data scientist turned software dev market. They’ve just recieved a lot of funding so I’m looking forwards to seeing what they come up with. Here’s a repo of mine which tracks the reps from my workouts. The Streamlit app can be found here.
Service development
Here are some packages I’ve found useful for the purpose of developing backend services.
- FastAPI is an excellent framework for developing APIs in Python. It’s got great documentation, is performant and usually you can find tutorials for deployment within the community. 100% recomment to all my friends.
- Typer is FastAPI’s sibling and is used to write CLIs. I use it when I’m containerising some server script and I’m dressing it up with some form of simple API.
- Jinja2 is mostly used within web development, but I use it a lot when I’m generating some form of configuration file or spitting out an html page with results. It’s here because I had to include it somewhere.
CI tools
Continuous integration is the attempt to make deploying performant code as smooth as possible. Part of this process involves running tools on code to spot errors that the humans missed. These tools don’t write your code for you, but they help to reduce the frequency and magnitude of errors that occur.
- Pylint helps with identifying style compliance, code refactoring and error detection. It returns a score after scanning your code and sometimes I get absorbed in making it increase! I then remember that I get paid for functionality rather than tidyiness.
- MyPy is used to check static types in Python code. PEP 484 added type hints to Python. Type hints are useful for thinking about how different functions fit together. MyPy helps to detect bugs when using type hints.
- Black is a code formatter. It helps bring uniformity to rogue code bases and essentially helps everyone to read Python code the same way.