I’ve just spent a couple of days at O’Reilly’s Strata Data Conference in London and got a much better idea where the world of big data, machine learning (ML) and AI may be heading. These sectors have developed rapidly over the last 5 years with new technologies, processes and applications changing the way organisations are managing their data.
The Strata conference provides a good barometer of what the state-of-the-art is in big data manipulation as well as the concerns of developers and users. Eight key points emerged for me from the event.
1. 5G will stimulate the growth of ML and result in new applications and services
I spoke with O’Reilly’s Chief Data Scientist and Strata organiser, Ben Lorica about this and he sees the increased bandwidth and flexibility of 5G as well as the move to edge computing as key enablers. He pointed out that China is a leading global force in this technology but that many firms are still working out the business models for all the 5G investments they are making.
2. Changing skillsets for data scientists
Cassie Kozyrkov, Google Cloud’s chief decision scientist, pointed out in her talk that as the UX for ML tools is improved, the skills required will become less technical and more focused on the ability of data scientists to work across silos and be more integrated into the business.
3. The online and offline worlds are merging
China’s Alibaba ecommerce group and Amazon are experimenting with physical store spaces while bricks and mortar stores are still adapting to the new online world. It feels to me that the offline moves by ecommerce groups are offensive while the online investments by physical retailers are defensive. There is still a long way to go before this fully plays out but the expertise that companies like Amazon and Alibaba have with managing data at scale gives them a key advantage.
4. Internal data platforms are becoming essential for growth and innovation
Presentations from data scientists at Lyft and BMW showed how putting data platforms at the centre of new product development and business process management are driving innovation. While this may come naturally for digitally native companies like Lyft it is also something that traditional, industrial companies are having to engage with as data generating sensors become embedded within products.
5. Open data needs to be taken as seriously as open source software
We all know that open source software is behind the rise of many big data and ML products and services. The commercial and technical case for open source was proven years ago. However, much less attention has been paid to the importance of open data for innovation. The outputs of algorithms are only as good as the quality of the data that goes into them.
Chris Taggart, co-founder and CEO OpenCorporates, the biggest open database of companies in the world, highlighted the problems that companies run into when they rely on proprietary datasets where data provenance may be sketchy and meta data not shared across products. Open data is more transparent and does not lock firms into expensive commercial contracts that can be very difficult for companies to wean themselves off.
6. Importance of capturing and managing real-time data
While real-time or near real-time data is not always required for AI and ML projects, the ability to build systems that can handle it can be a valuable form of competitive advantage. As data-driven decision making become more embedded within organisations the competitive edge will sometimes go to those that can respond more quickly to events. The scale and breadth of offerings from Amazon Web Services in this respect show how the tools to do this are becoming easier and cheaper to access.
7. Legal and ethical issues are starting to change how firms innovate
A talk by Dr Sandra Wachter of Oxford University highlighted an issue that, I suspect, will become more discussed over the coming year or two. She pointed out that many firms are now aware of their obligations to protect personal data as initiatives such as the GDPR have come into force. However, a less discussed issue and one that regulators are still grappling with is that of inference and the decisions that are being made by embedded algorithms based on the data they are processing.
We have a right, in Europe at least, to see what data is being held on us and, to varying degrees, have it corrected or removed. However, we do not have the same redress with the assumptions that firms may be automatically making about us because of this data in areas such as credit checking and health insurance.
8. “To those that have shall be given”
As the conference came to a close, I started thinking about how smaller companies without access to the massive datasets of the internet giants or global FMCG firms will be able to compete in the age of big data and algorithmic-decision making. There is the danger, and perhaps we are already seeing it, of virtuous circles of innovation utilising network effects from online services cementing the position of large companies.
However, as Shivnath Babu, co-founder and CTO of Unravel Data Systems, pointed out to me, the internet and app economy is still capable of allowing small firms to leverage data from their apps and online activities and make an impact on markets. Perhaps this and the rise of open data emanating from public data sources will provide the basis for a new generation of start-ups to change the world in the way that Google, Facebook and Amazon have over the last 20 years.