There are plenty of cloud technologies that try to facilitate using Machine Learning algorithms and deliver much more insight from possessed data.  

Possibilities seem to be endless – Recommender Systems for retailers, Natural Language Processing (NLP) used for sentiment analysis, Speech Synthesis that can be used at airports or Image Classification for car license plate recognition. It is said that up to 2022, 93% of companies will have implemented AI solutions and there is a natural press on companies from all domains, to begin with, AI now to get a competitive advantage in a data-driven world.

https://www.talend.com/blog/2019/03/14/6-ways-to-start-utilizing-machine-learning-with-amazon-web-services-and-talend/

However, coming to the meaning of democratizing AI, it is expected that the AI-based insights will be accessible for every organization, or even every person among the organization. One way to achieve this is to connect Talend with Amazon Machine Learning, Microsoft Azure, Google Cloud AI, or any other cloud technology. But there are at least three major problems with this approach:

Problem #1: There is a private data

The various organizations may not be willing to transmit their data to the cloud. Sometimes it is even not a case of willingness but a legal constraint. In other words cloud solutions are just blocked from default.

Problem #2: Clients need customized solutions

As I mentioned at the beginning – AI possibilities seem to be endless however the devil is in the detail. It appears that NLP models trained on specific domain texts deliver better results which can be substantial for a required level of sentiment analysis. Transmitting all video streams for example from the shopping mall is unaffordable, because of network bandwidth and cloud solution cost of such a large amount of data to process by AI models. At last, but not least, there can be a situation where the organization’s demand is so specific, that there is just no such cloud solution.

Problem #3: The problem with a trust

There can be a problem with trust in cloud technologies among organizations, because of numerous failures of cloud solutions like Google’s AI labeling of some people as animals or Microsoft’s Hitler-loving robot. There is a high chance that a company interested in AI technologies wants to have more control and dedicated corporate audit process that requires something more than just REST API.

 

 

 

 

 

What can we do?

 

The presented component enables Talend to perform an image classification task on the board, without a necessity to use any cloud technology. A scenario is very simple:

  1. There is a new image in our system.
  2. Our AI solution, just a Talend Component, takes an image and automatically predicts what is the main thing on it.
  3. Predictions are saved.

 

Even for such simple cognitive function as image classification, we can find real use-cases, let’s say:

  1. There are thousands of pictures in the system and nobody knows what they show.
  2. We cannot store personal data in the system.
  3. If there is any picture that is classified as a photographed document then delete it and notify appropriate persons.

A toy example of our solution:

Conclusion

Here we concentrate on saying what is mainly on the image, but the same concept of the custom component can be extended to any Computer Vision task like Object Detection or Instance Segmentation. Going further we can easily imagine solutions related to medical time series problems or predicting users’ purchasing behavior given a history of their past orders. The main point we want to pass here is a new way of delivering AI solutions in Talend that does not require any external solutions like cloud technologies and is the next step in the real AI democratization process.

 

Grzegorz Gwardys  – Data Science Lead, responsible for the development of AI and ML projects at Promity. Graduate of WEiTI WUT and lecturer at postgraduate studies at the Warsaw University of Technology.