By Paolo Léonard

A brief tutorial for using Great Expectations, a python tool providing batteries-included data validation. It includes tooling for testing, profiling and documenting your data and integrates with many backends such as pandas dataframes, Apache Spark, SQL databases, data warehousing solutions such as Snowflake, and cloud storage offerings (S3, Azure Blob Storage, GCS). This tutorial covers the main concepts you'll need to know to use Great Expectations, gently walking you through writing and running your first expectation suite.

If anything is incomplete or unclear, don't hesitate to open an issue!

Reading online

If you'd just like to read along, just open tutorial_great_expectations.ipynb in the repository and you're good to go! We made sure all important output is available online.

If you'd like to run the tutorial without running anything on your own machine, you can open it in Google Colab.

Run using docker

If you have docker installed, you can pull our container to run the tutorial:

docker pull dataroots/tutorial-great-expectations && docker run -it --rm -p 8888:8888 dataroots/tutorial-great-expectations

Alternatively, clone this repository and build the container yourself:

docker build . -t tutorial-great-expectations && docker run -it --rm -p 8888:8888 tutorial-great-expectations``

Next, copy paste the URL on the last line of the output to your favorite web browser, and navigate to the tutorial_great_expectations notebook. Enjoy the ride!

Run without docker

For running the tutorial on your own machine, we reccomend using a virtual environment.

  1. Clone the repository
  2. Install the dependencies: pip install -r requirements.txt.
  3. Run jupyter notebook in the root directory; then navigate to the tutorial_great_expectations notebook.

If you see AttributeError: module 'great_expectations' has no attribute data_context, you probably do not have Great Expectations installed. Make sure that it is installed and restart your kernel to fix this.

The code

GitHub - datarootsio/tutorial-great-expectations: A tutorial for the Great Expectations library.
A tutorial for the Great Expectations library. Contribute to datarootsio/tutorial-great-expectations development by creating an account on GitHub.
Github repo

Acknowledgements

Avocado dataset provided by the Hass Avocado Board, https://hassavocadoboard.com/volume-data-projections/ .

You might also like

Data Quality for Notion Databases 🚀 - Ricardo Elizondo
> Notion ➕ Great Expectations = 🚀If you’ve ever heard of or used Notion (specially their databases) and GreatExpectations, you can already imagine what this is about 😉. If not, find aquick ELI5 below: See our Github [https://github.com/datarootsio/notion-dbs-data-quality] for moretechnical de…
great_expectations: writing custom expectations - Paolo Léonard
If you are working with a lot of data like we do at dataroots then it is highly possible that you encountered your fair share of bad…