Internship topics

Get in touch
Internship topics

An eco-friendly AI

Nowadays, AI is booming, and we see 2 main trends: AI is applicable in all domains and AI models are becoming bigger and bigger, thus consuming more and more energy and resources. At dataroots, we are aware of the AI energy consumption and we would like to investigate techniques to reduce models' resource usage.  

With this internship, we would like to benchmark different optimization techniques that have an impact on model's energy consumption (like hardware optimization, peak shaving, complexity reduction, other architecture, etc).

Requirements : python, machine learning

Domain : AI

Few shots learning

Few shots learning is about creating models with only few data points. In real cases, it happens often that we only have very few data points to train a model.  This results in reduction of accuracy, robustness of the model, etc.

With this internship, we want to benchmark the newest techniques in few shots learning and evaluate the impact it has on robustness, bias, sensitivity, performance, etc. One of the technique that can be used to counter the problem is to create synthetic data and another one is active learning.

Requirements : python, machine learning

Domain : AI

Sensor fleet management

In IoT use cases, we usually have more than one sensor to manage. We also might want to deploy different modules on each sensor. Indeed we might want to do AB testing.Furthermore, when using sensors, we don’t always have the luxury to have sensors in different environments (DEV,  UAT and PROD).

With this internship, we want to simulate sensors in DEV and UAT (with cloud virtual machines), then create a strong governance around IoT devices management and CI/CD pipelines to facilitate the deployment of new modules and management of a fleet of IoT devices.

Requirements : CI/CD, cloud, sensor

Domain : cloud engineering

Strava challenge analysis

Dataroots organise twice a year a strava challenge with the whole company.  The participants are split into teams and people can walk, run, bike or practice any other sports to win points individually and for the team. We even have a dashboard automatically fed by the strava account's data.

With this internship, we would like to create a visual dashboard displaying insights in real time for our next strava challenge.

Requirements : python,  dashboarding

Domain : data strategy

Generate cooking recipe from instagram pictures

Did you ever get hungry by just looking at a picture ? Many pictures of great dishes are posted on instagram but it's not possible for anyone to reproduce such dish. Now AI has a lot of capabilities like generating text, images, etc.

With this project, we want to generate cooking recipes from nice pictures of dishes. The recipes should include the correct ingredients and the steps to cook the dish.

Requirements : python, computer vision, NLP

Domain : AI

NFT AI art generated

NFTs are cryptographic tokens existing on a blockchain. They are proof of ownership, allowing traceability and making trading easier and safer. Assets that are represented by NFTs can be very varied from tangible to intangible.

With this internship, we would like to generate AI digital art and use NFT's to assign ownership to the algorithm. The goal will be to create an NFT art collection, fully traceable.

Requirements : python, blockchain

Domain : AI, data engineering

A trading bot for crypto currencies

Crypto currencies are more and more present and very interesting. They are quite volatile though and a bubble could be created.

With this internship, we would like to create an AI bot which will analyse the market of crypto currencies and buy and sell currencies automatically, based on the news, the markets volumes and the population sentiment.

Requirements : python, machine learning

Domain : AI, data engineering

Anonymising data with GANs

Usually, prod data is not available in dev environment for security reasons. As machine learning models really need representative data to build a model, it’s not possible to train a model on scrambled data and expect good results.

With this project, we want to investigate whether with the use of GANs (or other ML techniques) we can create representative synthetic data that can be used to train a model with similar performance. Also, we want to make sure that the data cannot be reconstructed afterwards. We want to develop a good methodology to anonymise data and still be able to use it for model building.

Requirements : python, machine learning, GANs

Domain : AI, data engineering

MLOPs benchmark

We only hear MLOps nowadays, you need to version models, version data, record experiments, etc. Many tools and platform have emerge and now propose some or all the features from the MLOps philosophy. It is sometimes difficult to know which one to use in which case and for what.

With this internship, we want to benchmark existing tools and evaluate which tool best fits which problem/ situation. We also want to understand better the governance framework around MLOps and how it fits within organisations. Finally, we want to know how MLOps principles will likely evolve.

Requirements : python, MLOps

Domain : AI, data engineering

Better data governance, better AI

There is no doubt that data is fundamental for Artificial Intelligence. Knowing what is happening on your data is therefore critical to helping understand why your AI models are behaving the way they do. Data governance tools provide answers to questions related to your data (Where does the data come from? • What does your data actually represent? • Does your data contain any ethical or legal issues which might lead to an inaccurate output? )

In this Internship project, we want to investigate how data governance could help solve problems (e.g. overfitting) in AI models and probably improve accuracy. We will use an open-source metadata hub tool to build a data governance solution for a specific scenario.

Requirements : python, data

Domain : data engineering

Ethical AI methodology

Ethical AI is a very interesting but it’s not always very concrete. A lot of AI models are deployed in production without measuring how much they discriminate or have a different behaviour for different sub-populations.

With this project, we want to first investigate what are possible tests to determine whether an AI is discriminating or not and also create templates that can be applied to any use case very easily.

Requirements : python, statistics

Domain : AI, data strategy

Benchmarking data streams

Adding new data stream to a streaming service can be tricky. What is important? How can I easily add a new data source to a streaming service? With data mesh becoming more than just a concept, serving on demand data is even more important.

With this internship, we want to create templates to add different data sources, including sensor data sources, to a streaming service and even benchmark multiple streaming services.

Requirements : streaming services, cloud, infrastructure as code

Domain : cloud engineering

Deep RL implementation to ml5.js

ML5.js's goal is to help a broad audience (artists, creative codes, etc). It aims at making machine learning easy to use. The library provides access to machine learning algorithms and models in the browser, building on top of TensorFlow.js. There currently is no implementation of deep RL algorithm in this framework.

With this internship, we want to contribute to the open source community and open source a deep reinforcement learning algorithm.

Requirements : python, deep learning, reinforcement learning

Domain : AI

Explainable AI for the end user

Explainable AI is often still very technical and can be difficult to be explained to business stakeholders and less technical stakeholders.

With this internship, we want to create a dashboard that can be used to explain a model to non technical stakeholders, combining statistical metrics and various XAI techniques.

Requirements : dashboarding

Domain : AI, data strategy

AI butler in meta

Meta, the new facebook is the symbol of the metaverse, web 3.0, a fully digital universe in which your avatar can lead another live, discover countries, play games and even work.

With this internship, we want to create an AI avatar that can solve simple tasks in meta and be useful to other avatars. For this, we will use reinforcement learning techniques.

Requirements : python, reinforcement learning

Domain : AI

For a business to make data-driven decisions, having a well-designed datalake is not enough. It needs to have a good overview of the data this is available to answer its business-related questions. For this purpose, data catalogs have become full-fledged tools to store, update and retrieve metadata. Still, finding the right data for a task remains a difficult problem, especially for people with no technical experience. Often, a data catalog stores very technical descriptions of tables and attributes, and provides no support for smart querying such as “show me data I can use to do churn prediction”.

This internship will approach data discovery as a “shopping cart problem”, where people can search for data and receive smart recommendations, based on (for example): - their past selections, - data that is often selected together, - data that other people have selected for similar queries, - semantics of the query (compared to doing a simple text-based search - ...

Requirements : SQL, database

Domain : data engineering

Blockchain contracts fraud

Blockchains are open source, distributed databases that record transactions between users but also between users and used-defined contracts. They allow usually much safer and secured transactions. Still, fraud can happen.

The goal of this internship is to investigate the transactions between users and contracts to find suspicious activities (dumping of large amount of assets, • buying of large amount of assets, • sandwich attack• etc).

Requirements : python, fraud, blockchain

Domain : AI, data engineering

Causal AI

There’s a lot going on about explainable AI, causal AI goes even a step further and aims to tell a user why a certain decision was made by an AI.

Causal AI indentifies the underlying web of causes of a behavior or event and furnishes critical insights that predictive models fail to provide

With this internship, we want to explore the concept of causal AI and build a demonstrator that clearly showcases how this can be applied in practice.

Requirements : python, statistics, machine learning

Domain : AI

Take action with PLC sensor

In the industry 4.0, we are often working with sensors that have their own protocols to interact with the machines. It is not always easy to connect the sensors to the cloud and to send back information to the sensors such that an action is taken.

With this internship, we want to explore the connection between PLC sensors and the cloud and understand how to send ‘orders’ to the sensors. This should happen in a environment where the sensors already interact between each others and with the industry 4.0 machines.

Requirements : PLC, python, cloud

Domain : cloud engineering


We're here for you 🙌

Contact form

sign up to our weekly AI & data digest ❤️