As a data scientist, programming is an important part of your day-to-day work. At the same time, you may have little formal training in software development. Are your valuable programs difficult to understand, modify, maintain, and share?
This one day course is designed for data scientists and engineers who are already using Python and want to take their skills to the next level. At the end of this course the students will know how to structure their Python programs for improved reuse, how to build and use automated tests for their code, and how to analyze program performance. The class will use Python 3.
We start by covering ideas and concepts for improving overall software design . We then explore how these ideas can be applied to a small but realistic Python project. We will look at techniques and best-practices for working with Python projects in groups
• Software design principles
• Managing dependencies in software
• Isolating development environments
• Packaging code for reuse
• Documentation and style
• Automated testing
• Profiling programs
• Strategies and techniques for optimization
• Maintaining invariants and constraints
• Creating command-line interfaces
• Sharing code with package servers
• Creating isolated environments with venv or conda
Applying core software design principles
• Following Python best-practices
• Using a practical and flexible project structure
• Building packages from your code
• Documenting your code in a standard way
• Creating and running automated tests
• Using a profiler to find performance problems
• Optimizing your code based on profiling data
• Uploading packages to a package server
• Using your own packages from a package server
Some experience working in Python is assumed.
Basic knowledge of pandas is helpful but not critical.
Bring your own computer with Python 3.3+ and an editor installed.