Deploy and use the RTR Data Compare validation tool
This page covers deploying and using the Data Compare tool to validate RTR output against classic ETL results.
On this page
- Overview
- Prerequisites
- Deploy the Data Compare API
- Deploy the Data Compare Processor
- Configure ingress
- Verify the deployment
- Use the Data Compare tool
Overview
The Data Compare tool is an optional RTR validation service that allows STLT users to compare data processed by RTR against the classic ETL pipeline and identify differences.
This service is optional. STLTs can choose to install it only if they require RTR validation capabilities.
The tool consists of two containerized services that communicate asynchronously through Kafka:
- Data Compare API - pulls and prepares data from designated tables, then uploads it to a cloud storage bucket
- Data Compare Processor - retrieves data from the cloud storage bucket and performs the comparison logic
Database changes are managed by Liquibase, integrated within the DataCompareAPI service. Schema changes are applied automatically during deployment. The database objects in the following directory are for reference only: NEDSS-DataCompare/DataCompareAPIs/…/db/data_internal
Prerequisites
Before deploying the Data Compare tool, verify the following:
-
Access to a cloud storage bucket for data exchange between the API and Processor services. These steps currently use Amazon S3. If you are not using Amazon S3, consult your cloud administrator for equivalent storage configuration.
-
Keycloak configured with the Data Compare API profile: NEDSS-Helm/charts/keycloak/extra
If your Keycloak pod uses a name or namespace other than the default, update the authUri in your values.yaml:
authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS"
These steps require a Unix-compatible shell. On Windows, use Git Bash, WSL, or an equivalent terminal emulator.
Verify that you are connected to the correct Kubernetes cluster before proceeding. To confirm, run
kubectl config current-context.
Deploy the Data Compare API
Follow these steps to configure and deploy the Data Compare API Helm chart.
The Helm chart is located in charts/data-compare-api-service in the NEDSS-Helm repository.
-
Configure
values.yaml. Replace all placeholder values before installation:image: # Data Compare API image repository: "quay.io/us-cdcgov/cdc-nbs-modernization/data-compare-api-service" pullPolicy: IfNotPresent # Replace with the target release version tag, e.g. v1.0.1 tag: <release-version-tag> ingressHost: "data.EXAMPLE_DOMAIN" jdbc: # SQL Server endpoint dbserver: "EXAMPLE_DB_ENDPOINT" username: "EXAMPLE_ODSE_DB_USER" password: "EXAMPLE_ODSE_DB_USER_PASSWORD" authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS" s3: # AWS-specific: replace with your AWS Region and S3 bucket name region: "AWS REGION" bucketName: "S3 BucketName" -
Install the Helm chart:
helm install data-compare-api-service -f ./data-compare-api-service/values.yaml data-compare-api-service -
Verify the pod is running:
kubectl get pods -
Validate the service by opening the Swagger UI. Replace
<data.EXAMPLE_DOMAIN>with your actual domain:https://<data.EXAMPLE_DOMAIN>/comparison/swagger-ui/index.html
Deploy the Data Compare Processor
Follow these steps to configure and deploy the Data Compare Processor Helm chart.
The Helm chart is located in charts/data-compare-processor-service in the NEDSS-Helm repository.
The Processor is a Kafka consumer microservice and does not expose any API endpoints.
-
Configure
values.yaml. Replace all placeholder values before installation:image: # Data Compare Processor image repository: "quay.io/us-cdcgov/cdc-nbs-modernization/data-compare-processor-service" pullPolicy: IfNotPresent # Replace with the target release version tag, e.g. v1.0.1 tag: <release-version-tag> ingressHost: "data.EXAMPLE_DOMAIN" jdbc: # SQL Server endpoint dbserver: "EXAMPLE_DB_ENDPOINT" username: "EXAMPLE_ODSE_DB_USER" password: "EXAMPLE_ODSE_DB_USER_PASSWORD" authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS" s3: # AWS-specific: replace with your AWS Region and S3 bucket name region: "AWS REGION" bucketName: "S3 BucketName" -
Install the Helm chart:
helm install data-compare-processor-service -f ./data-compare-processor-service/values.yaml data-compare-processor-service -
Verify the pod is running:
kubectl get pods
Configure ingress
The Data Compare API uses the same ingress as the data ingestion service. Reuse the ingress configuration as needed: dataingestion-service/templates/ingress.yaml
Verify the deployment
Confirm both services are running without errors:
kubectl get pods
kubectl logs <pod-name>
Both services are ready when they are healthy and the Processor begins consuming from Kafka.
Use the Data Compare tool
The comparison process relies on the Data_Compare_Config table, which Liquibase creates and populates when the Data Compare API deploys. The table comes preloaded with records containing table names and queries that determine what data to compare.
To start a comparison, call:
POST /comparison/api/data-compare
Pass the runNowMode header to control scope:
true- runs only on records in the config table whererunNow = true; resetsrunNowtofalsewhen completefalse- runs on all records in the config table
This is an asynchronous endpoint. If authentication passes and there are no logical errors, it returns a success response immediately. The comparison runs in the background.
The following diagram shows the end-to-end data flow:
API → Pull data from SQL table → Upload to S3 → Kafka → Processor → Pull from S3 → Perform comparison → Upload results to S3
This data flow uses Amazon S3 as the storage provider. If you are not using Amazon S3, the upload and retrieval steps differ based on your cloud provider.