THANK YOU FOR SUBSCRIBING
Computer Vision (CV) is a branch of Artificial Intelligence (AI) for which we can observe an ever-increasing trend of use cases in different industries. Biotech Manufacturing is the latest that started application of CV to diverse tasks, such as quality inspection, safety, robotics, packaging, etc. While hundreds of CV models/algorithms have been proposed across the major 2020 and 2021 research conferences, the academic world often seems to lose sight of the reality of industry, thus making the practical application of new theoretical ideas to real-life scenarios difficult. Through this article, I am going to share some lessons learned in the field in order to successfully put a CV system in production in a manufacturing context.
Domain knowledge and Communication: The best approach to CV systems is to establish a continuous conversation with their stakeholders and recipients. Before embarking on any Proof of Concept or direct implementation it is important to thoroughly understand the type of business they conduct and what their real needs and pain points are. Data Scientists/ML Engineers are always eager to start as soon as possible implementation proposals which, despite being absolutely valid from a technical point of view and based on algorithms that are state-of-the-art for AI, are not the right solution for a specific case. It is important at this stage to include all team members into the conversation. Please remember that this isn’t something happening at project start time only: it is an iterative process. Also, please don’t forget to use the right language for any specific audience.
Regulations: Some industries, such as, for example, manufacturing or pharmaceutical, are subject to various regulations that cannot be ignored in the definition of requirements, design and implementation of CV systems.
"The best approach to CV systems is to establish a continuous conversation with their stakeholders and recipients"
Data Access: Aside from data protection laws and other industry-specific regulations, several multinationals also have internal policies that define who can have access to production data or not. Therefore, you need to be aware of it from the start and understand if data anonymization processes already exist or if you need to consider their implementation for the purposes of models training, validation and test. Although their internal data protection rules have been consolidated for some time, don’t take it for granted that lean and automated processes for this exist or are in the company roadmap.
Data Scarcity: For some phenomena to be observed, the availability of specific data is not such as to be able to start the prototyping of the models and to be able to set up their initial learning process. This is a not uncommon situation in CV systems in the industrial field, especially in cases of visual inspection, where some types of defects, although particularly critical and therefore not negligible, rarely occur over a long period of time for a product line. It is therefore necessary to identify these cases in the initial stages of a project and prepare strategies to mitigate their effects (one-shot or few-shot learning, data augmentation, synthetic data generation, etc.).
Data quality: The vast majority of CV systems research papers use public domain data sets or ad hoc curated data sets, which include images or videos at excellent resolution and often perfectly focused, a balanced number of images for each category, complete with labels and all related metadata and often also provide a division between images for training and testing having a uniform distribution of the various classes. This scenario is never encountered in the real world. In the industrial field machine mounted cameras or surveillance systems have a very low resolution and the images also include other unnecessary information in addition to the regions of interest: therefore complex operations of pre-processing are necessary before images or videos can be used for training and validation of models and to be able to reproduce what is described in the papers (and verify its applicability to a specific case). To make the scenario even more complex, several vendors do not provide for the possibility of programmatically interfacing with the industrial machinery they produce, thus making the automatic acquisition of metadata difficult or in some cases impossible. Obviously, images from industrial production are not labelled, so it is necessary to set up smart labelling systems for SMEs (Subject Matter Experts) as well as work in full synergy with them in order to create a context for images and videos and proceed towards supervised learning strategies. These tasks normally account for 40 percent of the time devoted to CV projects.
Skills Gap: There is a need to create multi-disciplinary teams, made up of people with the right skills (technical and otherwise), and this has to be considered as an investment and not as a cost.
Tech Infrastructure: Many organizations, due to lack of maturity, decide to speed up times by starting the various phases of a project before the infrastructure has been defined or is still being implemented. This inevitably causes, in particular for CV applications, slowdowns times or even the failure of an initiative. It is necessary to ask yourself several questions about the technical infrastructure before jumping headlong into the execution of a CV project. Particular attention should be paid to Experiment Reproducibility, Automation and Monitoring.
Business Value: What is the only sure way to understand if a given CV system is generating the expected business value? The answer is simple: put it into production. There is no other way: it is not enough to have demonstrated that you have found the right architecture and achieved very high performance for the model(s) involved. This is another reason to follow the previously mentioned best practices and suggestions.
At this point, you will most likely be wondering if something is missing in this article, as there is no explicit reference in it to the purely specific challenges for CV algorithms. This lack is intentional: as mentioned at the beginning, a large contribution in this direction continues to come from University and Research. The job of good AI professionals is to translate those theoretical concepts into solid reality and generate business value.