Before any business can make a crucial step forward in the transformation of their digital efforts, a paramount importance is placed on understanding and deriving even more insight from data than ever before. A key method to carry out such a transformation lies in the benefits of advanced predictive and prescriptive analytics. Responding the new demands of businesses undergoing such a transformation, data scientists too are well on their way to further their proficiency with the cutting-edge advancements of artificial intelligence (AI) and machine learning (ML) tools.
However, skilled data scientists are an expensive and scare resource. In order to bridge the gap between the demand and supply of these individuals, a new phenomenon has arisen. The “citizen data scientist” serves as a complementary role, rather than a direct replacement to the lack of skilled data scientists. Citizen data scientists lack specific advanced data science expertise. However, they can generate models using state-of-the-art diagnostic and predictive analytics. This capability is partly due to the advent of accessible new technologies such as “automated machine learning” (AutoML) that now automate many of the tasks once performed by data scientists.
Algorithms and automation
According to a recent Harvard Business Review article, “Organisations have shifted towards amplifying predictive power by coupling big data with complex automated machine learning. AutoML, which uses machine learning to generate better machine learning, is advertised as affording opportunities to “democratise machine learning” by allowing firms with limited data science expertise to develop analytical pipelines capable of solving sophisticated business problems.”
Comprising a set of algorithms that automate the writing of other ML algorithms, AutoML automates the end-to-end process of applying ML to real-world problems. By way of illustration, a standard ML pipeline is made up of the following: data pre-processing, feature extraction, feature selection, feature engineering, algorithm selection, and hyper-parameter tuning. But the considerable expertise and time it takes to implement these steps means there’s a high barrier to entry.
AutoML removes some of these constraints. Not only does it significantly reduce the time it would typically take to implement an ML process under human supervision, it can also often improve the accuracy of the model in comparison to hand-crafted models, trained and deployed by humans. In doing so, it offers organisations a gateway into ML, as well as freeing up the time of ML engineers and data practitioners, allowing them to focus on higher-order challenges.
Overcoming scalability problems
The trend for combining ML with Big Data for advanced data analytics began back in 2012, when “deep learning” became the dominant approach to solving ML problems. This approach heralded the generation of a wealth of new software, tooling, and techniques that altered both the workload and the workflow associated with ML on a large scale. Entirely new ML toolsets, such as TensorFlow and PyTorch were created, and people increasingly began to engage more with graphics processing units (GPUs) to accelerate their work.
Until this point, companies’ efforts had been hindered by the scalability problems associated with running ML algorithms on huge datasets. Now, though, they were able to overcome these issues. By quickly developing sophisticated internal tooling capable of building world-class AI applications, the BigTech powerhouses soon overtook their Fortune 500 peers when it came to realising the benefits of smarter data-driven decision-making and applications.
Insight, innovation and data-driven decisions
AutoML represents the next stage in ML’s evolution, promising to help non-tech companies access the capabilities they need to quickly and cheaply build ML applications.
In 2018, for example, Google launched its Cloud AutoML. Based on Neural Architecture Search (NAS) and transfer learning, it was described by Google executives as having the potential to “make AI experts even more productive, advance new fields in AI, and help less-skilled engineers build powerful AI systems they previously only dreamed of.”
The one downside to Google’s AutoML is that it’s a proprietary algorithm. There are, however, a number of alternative open-source AutoML libraries such as AutoKeras, developed by researchers at Texas University and used to power the NAS algorithm.
Technological breakthroughs such as these have given companies the capability to easily build production-ready models without the need for expensive human resources. By leveraging AI, ML, and deep learning capabilities, AutoML gives businesses across all industries the opportunity to benefit from data-driven applications powered by statistical models - even when advanced data science expertise is scarce.
As It is quite clear that businesses will become ever more reliant on civilian data scientists, any professional working within this industry will understand that 2020 is likely to be the year in which enterprise adoption of AutoML will start to become commonplace.
Because this technology is easily accessible, enterprises will open up to the black box” of ML, allowing themselves to be enlightened by the knowledge of its processes and capabilities. AI and ML tools and practices will then move on to become ingrained in the everyday of business operations, wholly because the invaluable insight gained through these tools will drive better decision-making and innovation.
By Senthil Ravindran, EVP and global head of cloud transformation and digital innovation, Virtusa