blog Article

Simplifying GPU Setup and Boosting Reproducibility with Radix's Poetry Cookiecutter

Author: Pierre Gerardi ,

 

Using AI technology is not just about staying in the game, it's about getting a step ahead of the competition. The evolution of AI, and more specifically of deep learning models is taking an exciting turn, with advancements in sophisticated, yet very approachable systems like Large Language Models. These models are changing how we find, interpret, and use data patterns. Equally important is their ability to translate these insights into tangible business results—improving operational efficiency, predicting market trends, personalizing customer experiences, or identifying potential risks, thus providing businesses with a strategic edge.

However, to train or finetune these models to your use cases, and realize their full potential, robust infrastructure is essential. Particularly, the use of GPUs has proven to be exponentially faster and more efficient than utilizing traditional CPUs alone. However, integrating this in your projects is not as straightforward as one might expect.

This blog post will provide insights into the challenges when using GPUs in your projects and how to overcome them by using the Poetry Cookiecutter developed by Radix. The Poetry Cookiecutter is designed to simplify and structure project setup for developers. You will learn how we have enhanced the functionality of our Poetry Cookiecutter to include GPU support. 

This feature boosts the reproducibility of a project by allowing it to run on a GPU or CPU without necessitating any changes in the software code.

Setup struggles

The real challenge pops up during the setup process. Ensuring software compatibility, installing the right libraries and configuring them correctly is not always straightforward. You need a compatible GPU, software like TensorFlow or Torch, which have GPU-accelerated versions, in addition to components such as CUDA Toolkit - a platform developed by NVIDIA for parallel computing. Let's not forget about the specific deep learning libraries like cuDNN, providing optimized functions for faster model training and inference times. 

Unfortunately, making all these things work together smoothly can often turn into a tough task. It requires specific technical skills and eats up important time from the developer that could be better used to focus on bringing value.

That’s why Radix has developed a solution called Poetry Cookiecutter, which is a package specifically constructed for scaffolding Python packages with a particular emphasis on eliminating the need for manual project configuration. In order to provide a seamless project setup, it automatically builds a robust software project structure by incorporating all best practices. Before, there was no GPU support yet available, but now we've designed our Poetry cookiecutter to incorporate GPU support, with the aim of freeing developers from the complex process of setting up a GPU. We've ensured that all necessary installations and configurations are included right within the template. This prevents developers from getting stuck on setup procedures, instead allowing them to focus their attention on more important tasks.

While using pre-set images like TensorFlow-GPU or Torch-GPU may seem like a way around setup challenges, they fall short in providing the flexibility and additional attributes that our enhanced Poetry Cookiecutter offers. These basic images don't offer Python environment management.

In contrast, our Poetry Cookiecutter not only simplifies GPU setup but also provides features like Python environment management, making it a more comprehensive solution. Therefore, despite their convenient nature, standard images simply can't match the overall control and benefits of our tool, an essential consideration especially during the transition from CPU to GPU-based projects.

Enhancing the cookiecutter reproducibility by adding GPU support

Next to facilitating the setup of a new project, one of the main objectives of our Poetry Cookiecutter is to create a robust machine learning environment. We want to swiftly transition from a proof-of-concept setup, usually executed on a local machine, to a production-ready structure. A way to transition easily is by employing development docker containers. In simple terms, dev containers function as complete development environments enclosed within a docker container. 

Dev container offers a reproducible environment regardless of the machine on which it is deployed. It comes with all things needed, avoiding the common 'it only works on my computer' issue. These containers let you change your source code on your own machine and run it in the container at the same time. This makes the deployment of applications easier across different systems, while keeping everything consistent.

Leveraging a containerized development environment is a huge step towards abstraction from the underlying infrastructure. However, up until now, this only applied to CPU-based applications. With the integration of GPU support in the Poetry Cookiecutter, we have broadened this abstraction to use different processing units. This allows the same code to run on your local CPU during the development of models or applications, and on a GPU when it comes to training models or distributing the applications, without needing to alter the underlying code. 

How to use it

To activate GPU support with our upgraded Poetry Cookiecutter, you now simply adapt the base Docker image variable when initiating a new Python project. If there is no need for CUDA support, utilize the 'python:$PYTHON_VERSION-slim' image. However, if CUDA is required, you can opt for 'radixai/python-gpu:$PYTHON_VERSION-cuda11.8' as your base image. 

Upon selection, the appropriate line of code incorporating the base image is automatically integrated into your Dockerfile. This ensures the accurate CUDA configuration is bundled in your base image. Keep in mind, we currently offer support for CUDA Toolkit 11.8 paired with Python versions 3.8, 3.9, 3.10 and 3.11. We are continuously monitoring the latest developments in the CUDA ecosystem. As newer versions are released, we will aim to update our Docker images accordingly so you can always leverage the most recent advancements in your projects.

Conclusion

The incorporation of a GPU setup into our Poetry Cookiecutter may present various advantages, such as time-saving and enhanced efficiency. However, arguably, the most remarkable benefit lies in the improvement of reproducibility. The idea is that a developer can seamlessly work across diverse underlying infrastructures, like a local GPU or CPU, or even a remote cluster of GPUs or CPUs, without changing a single line of code. This enhances the ability to reproduce the software significantly.

This approach eliminates environment-specific complications, allowing for a smooth transition from the development stage to a full-scale production stage.

We hope this post has provided you with a clearer understanding of the benefits of utilizing Radix's Poetry Cookiecutter with GPU support. We continually enhance our tool to ensure it meets your needs and stays up to date with technological advancements. 

If you find this tool useful, or want to keep updated with future enhancements, please show your support by giving us a star on our Github repo. Your feedback is invaluable to us and contributes greatly to the continuous improvement of our tools. Thank you!

Pierre Gerardi
About The Author

Pierre Gerardi

Pierre is a Machine Learning Engineer at Radix. He obtained his first interest in Artificial Intelligence during his Business Engineering-Data Analytics studies at the University of Gent. Pursuing this interest, he decided to take on an additional Master in Artificial Intelligence at KU Leuven. As he couldn't get enough, he joined Radix as a Machine Learning Engineer to work with Al on a daily basis. Due to his hands-on mindset, Pierre wants to assist clients in solving their complex problems and help them implement an Al-based solution.

About The Author