In the context of a data lake, the biggest role of a data engineer is to import and clean different data sources from the company into one place. Once that data is available, the goal of a data scientist is to combine those different data sources into a derived data layer to bring extra value. The business logic behind that derived layer is created by data scientists as their job is to translate business use case into code. To enforce data engineering best practices, we created a small framework to allow data scientists to quickly create and test derived data pipeline that fits directly into our data lake.
sign up to our weekly AI & data digest ❤️