Blog post

Gen AI changing the role of data teams and experts

By Iiris Lahti
blog image
The Generative AI trend is booming and transforming abilities to utilize AI. Large Language Models (LLM's) have become freely available through various tech companies, including ChatGPT by OpenAI and Bard by Google. Testing, experimenting and exploring the possibilities of AI are endless, and it only requires a browser, curious mind and some creativity to get started. Most recently, ChatGPT has released connections to real-time internet sources through Bing and brings more transparency to the sources of information. Corporations can start building their own OpenAI environments that allow more secure use of company's own data for model training and prompting. 

This has been a transformative period also for data teams and experts as the focus has shifted from building to buying. Suddenly, an AI experimentation project can be carried out without a data scientist. Has this become even an identity crisis for some? We dare to say "yes, but..." and here is why:

1. The need for data scientists

"Data Scientists are not needed anymore as we can use the OpenAI and similar Generative AI models"
Yes, the need for data scientists changes. A data scientist, especially one that has NLP (Natural Language Processing) as a specialty area, needs to learn new tricks and how to apply the freely available LLM's. For example, help the business to decide when to use Generative or Foundational models, how to finetune them and when tailored model is needed. Data scientists are also needed to review and validate the output of the models, open up the basic functionalities and possibilities of these models to the business users and help them to innovate new use cases.

2. Self service AI development

"Business can start using and developing the AI solutions independently"
Yes, the independence grows in a similar way as in any self service analytics project. If you have high quality and trustworthy data available, and if you know what the input and output of the model is for it to behave in an expected way, then go for it! But if the data used to train or prompt the model is not complete, accurate, up-to-date or fit for the purpose, the end result might be worse than not using it at all. We have seen similar development with no code - low code analytics platforms that offer possibility to build and utilize models without coding skills. With these self service tools it is equally important to understand how the model works, what it can be used for and what is expected as an end result.

3. Developing AI competence and skills

"Everybody can use ChatGPT today, we don't need any education how to use it"
Actually, prompt engineering and model finetuning has become an important skill for any data scientist, analyst, software developer or business user. Additionally, learning how to apply AI in business in an ethical, sustainable and effective way and generate value to the business, customers and employees with it has become even more important. The use of AI should always have a clear purpose, the business needs to be aware of the risks and regulations, and the users some level of understanding how it works and can be applied. That is why AI education in companies has become even more important, covering the basics of how AI works and how it can be applied.

4. Importance of transparency, ethics and AI governance

"With Generative AI, we can skip the boring data governance stuff and jump right into cool AI projects"
When applying AI, also Generative AI and Foundational AI models, the need to understand data lineage and how data is processed, gathered and can be used still remains. When using company's own documents, information and data sources to train the model or as a model input, the business risks should be always carefully evaluated. With these Generative models, it can be difficult to have transparency to model outputs. This requires continuous testing and monitoring of the outputs, even if the models have been finetuned for better fit to the purpose. 

5. AI boosting data development effectiveness

"AI can replace all coding work in the future"
Nice thing about the Generative AI solutions is that it can also be used to make the data engineering work more effective. Data engineers can start generating or validating code or test scripts with AI. AI can help detect data quality issues, automate data labeling or improve classification of data. As in any AI project, the models should be assistive to humans instead of targeting for full automation and one should always have ability to question the model output and performance.

If your organization or industry has not yet figured out how to utilize AI, the best thing that you can do is to start building the AI innovation and utilization skills. Organization that has solid data governance practices, basic guidelines how to use AI safely in the company context and has been able to get people excited to explore the possibilities of AI is already quite far. Let us help you figure out the areas to focus on to improve your AI readiness and ability to get the best out from the AI boom!

Did the article spark some thoughts? We'd love to hear!