Data-diff is a command-line tool and Python library to efficiently diff rows across two different databases.

⇄ Verifies across many different databases (e.g. PostgreSQL -> Snowflake) !

πŸ” Outputs diff of rows in detail

🚨 Simple CLI/API to create monitoring and alerts

πŸ”₯ Verify 25M+ rows in <10s, and 1B+ rows in ~5min.

♾️ Works for tables with 10s of billions of rows

For more information, See our README

How to install

Requires Python 3.7+ with pip.

pip install data-diff

or when you need extras like mysql and postgresql:

pip install "data-diff[mysql,postgresql]"

How to use from Python

# Optional: Set logging to display the progress of the diff
import logging

from data_diff import connect_to_table, diff_tables

table1 = connect_to_table("postgresql:///", "table_name", "id")
table2 = connect_to_table("mysql:///", "table_name", "id")

for sign, columns in diff_tables(table1, table2):
    print(sign, columns)

# Example output:
+ ('4775622148347', '2022-06-05 16:57:32.000000')
- ('4775622312187', '2022-06-05 16:57:32.000000')
- ('4777375432955', '2022-06-07 16:57:36.000000')