Back in 2015, at the time of its publication, the Hidden Technical Debt in Machine Learning Systems paper was mostly overlooked, but recently it has resurfaced quite spectacularly, having been cited in over 25 papers since the start of the year.
Now, biologist and ML-engineer Matthew McAteer, reviews which elements of this paper have stood the test of time and which have not, providing novel method examples to replace them. It’s a long and insightful post, that we’d recommend for anyone using ML, but if you’re strapped for time here are the key takeaways:
What is technical debt?
When software engineers prioritise speed of deployment over all other factors in development, the build-up of these ongoing costs is referred to as Tech Debt. The issues arising from fast-builds can take an awful lot of work to fix down the road. However, Tech Debt is considerably worse for ML systems, explains Sculley et al. in their 2015 paper.
Why is ML tech debt so much worse?
McAteer uses a custom-made Spongebob Squarepants meme to illustrate this point, but at its crux, the scale of tasks needed to make future development easier is only multiplied in ML engineering as compared to traditional software engineering.
Best practice #1: Use interpretability/explainability tools
Unlike traditional software engineering, where strong abstraction boundaries mean that the code is relatively easily maintained, in ML these boundaries are difficult to enforce as the real world rarely fits into tidy encapsulation. This so-called nebulous nature of ML, where we don’t hard-code the rules to convert data into specified outputs, but rather expect the algorithms to derive rules from the inputted data, makes segregating and organising the rules all the more complicated.
Best practice #2: Use explainable model types if possible
For interpretable ML, that overcomes the pitfalls of system entanglement McAteer recommends Facebook’s high-dimensional visualization tool. He explains that the ensembling methods and high-dimensional visualisation tools suggested in the 2015 paper would fall short in situations of excessively high-dimension data.
Best practice #3: Always re-train downstream models in order
To prevent the errors of correction cascades in nebulous ML models McAteer recommends a variant of greedy unsupervised layer-wise pretraining (or GULP) that although it cannot be explained mathematically, it seems to work well.
Best practice #4: Set up access keys, directory permissions, and service-level-agreements
Undeclared consumers i.e. the number of systems that you don’t realise depend on your ML model is a huge, and a complex area of tech debt, particularly in toolkits such as JupterLab. Even with experimental code, there needs to be communication between ML engineers and security engineers to ensure that changes to data sources don’t have a knock-on effect on other models.
This refers to dependencies that go beyond the regular coding considerations of software engineering, as ML depends on larger, and generally more unstable data sources.
Best practice #5: Use a data versioning tool
McAteer explains that since the 2015 publication, updated tools have been developed for tracking data dependencies, such as Data Version Control (DVC), that supersede past approaches like Photon. Additionally, Streamlit and Netflix’s Metaflow are another two great tools for data versioning in your modelling.
Best practice #6: Crop unused files, extraneous correlated features, and maybe use a causal inference toolkit
Legacy or redundant data can cause just as many problems as data dependencies, so save yourself some money down the line by chucking them now. The authors favourite packages for causal disentanglement are DeepMind’s work on Bayesian Causal Reasoning, Microsoft’s DoWhy, and QuantumBlack’s CausalNex. These tools again make the original paper’s ANCOVA recommendation seem pretty out of date.
Best practice #7: Use DevOps tools to track data dependencies
Since the 2015 publication, there have been a host of DevOps tools developed to address the static analysis of data dependencies. McAteer recommends two in particular: Snorkel and Red Gate SQL dependency tracker.
Undefinable feedback loops
The tech debt authors continue with the issues arising from unchecked feedback loops, both from direct loops in semi-supervised or reinforcement learning, or indirect loops as a result of engineers basing their designs on existing erroneous outputs. Designing algorithms resistant to direct feedback loops tend not to work at scale, whereas for indirect loops the problem often originates from outside of the organisation itself, so checking your data assumptions is crucial.
Best practice #8: check independence assumptions behind models and work closely with security engineers
Common no-no patterns in ML code
The vast majority of ML system coding can be considered “plumbing”, serving purely to maintain the model, while only a fraction is dedicated to learning or prediction. Such ML system anti-patterns confer high tech debt as it makes solving cell segmentation much more cumbersome.
Therefore, McAteer offers the following pointers, but with no suggestions of updated tools:
Best practice #9: Use regular code-reviews and/or auto code-sniffing tools
Best practice #10: Repackage general-purpose dependencies into specific APIs
Best practice #11: Get rid of pipeline jungles with top-down redesign/reimplementation
Best practice #12: Set regular checks and criteria for removing code, or put the code in a directory or on a disk far-removed from the business-critical stuff.
Best practice #13: Stay up-to-date on abstractions that are becoming more solidified with time
ML still lags years behind software engineering in terms of strong and widely accepted abstraction, with the forerunner still being Map-Reduce. According to McAteer, your best bet is to keep up to date with the literature on high-level abstractions coming out that don’t use PyTorch.
The *key* pointer: Don’t get attached to any single framework!
Many ML engineers fall into the trap of favouring one framework and then applying that framework to most problems. Disaster strikes when they try to apply this pet framework to a new context or a framework developed that is functionally indistinguishable. This is particularly common with distributed ML. Whereas, a truly skilled engineer will focus on building workflows that framework agnostic, recognising that most frameworks that enter this world don’t stick around for long.
If you’ve gotten this far in our flavour of McAteer’s analysis, we suggest that you now turn to his article for the “wordier” final sections covering configuration debt, real-world problems, additional areas of ML technical debt, and finally, how to measure tech debt; a litmus test if you will.
You can check out the full blog post here.