Truth in IT
    • Sign In
    • Register
        • Videos
        • Channels
        • Pages
        • Galleries
        • News
        • Events
        • All
Truth in IT Truth in IT
  • Data Management ▼
    • Converged Infrastructure
    • DevOps
    • Networking
    • Storage
    • Virtualization
  • Cybersecurity ▼
    • Application Security
    • Backup & Recovery
    • Data Security
    • Identity & Access Management (IAM)
    • Zero Trust
    • Compliance & GRC
    • Endpoint Security
  • Cloud ▼
    • Hybrid Cloud
    • Private Cloud
    • Public Cloud
  • Webinar Library
  • TiPs
  • DRAW

Implementing Vault for Databricks Secret Management

HashiCorp
04/09/2026
6
0 (0%)
Share
  • Comments
  • Download
  • Transcript
Report Like Favorite
  • Share/Embed
  • Email
Link
Embed

Transcript


TL;DR

  • MiQ extended HashiCorp Vault from microservices to Databricks data pipelines to eliminate hard-coded secrets across 3,000+ daily jobs processing 30+ TB of data
  • Implementation included automated secret scanning with TruffleHog/GitLeaks, user-specific Vault folder structures, and a Python utility library for seamless authentication and secret retrieval
  • Custom integration with Studio (MiQ's low-code pipeline tool) provides inline secret detection and UI-based secret management, eliminating developer friction during migration
  • Weekly automated monitoring scans all Databricks workspaces and reports violations to owners and leads, preventing regression to insecure practices
  • Solution prioritized user experience by embedding Vault functionality directly into existing development workflows rather than requiring separate tools or processes

The Hard-Coded Secrets Challenge in Data Pipelines

MiQ, a programmatic media agency processing 30+ terabytes of data daily across 3,000+ Databricks jobs, faced a critical security challenge with hard-coded secrets scattered throughout their data pipelines. The company needed a platform-agnostic secret management solution that could scale across their Databricks workspaces while maintaining developer productivity. Rather than relying on Databricks' native secret engine, MiQ extended their existing HashiCorp Vault implementation from microservices to data pipelines, creating a unified secret management approach across their infrastructure.

Four-Phase Implementation Strategy

MiQ's solution involved capturing existing secrets using TruffleHog and GitLeaks scanners, building automated remediation workflows, developing a Python utility library for seamless Vault integration, and establishing continuous monitoring. The team created user-specific folder structures in Vault, automated the migration of hard-coded secrets, and built custom tooling to minimize disruption to data engineering workflows. Their approach prioritized user experience by integrating Vault directly into Studio, their in-house low-code/no-code pipeline development platform, eliminating the need for developers to context-switch between applications.

Automated Secret Management and Governance

The implementation includes a Python utility library that handles JWT token authentication, retrieves user-specific secrets from Vault, and integrates with AWS Secrets Manager for private key storage. MiQ enhanced their Studio platform with inline secret detection, preventing developers from saving code containing hard-coded credentials, and providing a UI-based workflow for moving secrets to Vault without writing additional code. Weekly automated scans across all Databricks workspaces generate reports identifying any new hard-coded secrets, with notifications sent to repository owners and team leads to maintain ongoing compliance and prevent regression to previous insecure practices.

Chapters

0:00 - Introduction and Speakers
1:05 - About MiQ and Programmatic Media
2:32 - How Programmatic Advertising Works
3:57 - Data Scale and Processing Stats
4:34 - Problem Statement: Hard-Coded Secrets
5:42 - Four-Phase Solution Approach
5:56 - Phase 1: Capturing Secret Statistics
6:40 - Phase 2: Fixing and Migration
7:41 - Phase 3: User Experience and Studio Integration
8:44 - Phase 4: Ongoing Monitoring
9:39 - Python Utility Library Architecture
11:05 - Studio UI Features and Inline Detection
12:45 - Weekly Monitoring Reports
14:01 - Q&A

Key Quotes

4:37 "Hard-coding the secrets directly into the data pipeline is a common but risky practice, which possesses several security and operational challenges, and which requires a secret managers to be used."
6:06 "We had secrets lying around across all our repositories. And we were coming across it, but we did not have any consolidated report on how many secrets are we talking about, what kind of secrets are we talking about, where is it lying, who owns it."
7:46 "This would have been quite disruptive if we had just asked user to move all their secrets to the new secret manager. And going forward also, asking them to change the way they have been doing their development by going to a new application, a new UI, making changes in their code."
12:10 "While you're typing your code at that time itself, inline, you'll get to know what all secrets you have added. And the validation will not allow you to save this unless we have moved this secret out of this code editor."
12:57 "That doesn't ensure that nobody's going to add, nobody's going to not add any secrets in their notebook. So we have this report, which is scheduled at a weekly basis, which scans all the notebooks across the MyQ and generates a report and share it with the respective owner."

Categories:
  • » Data Protection » Backup & Recovery
  • » Cybersecurity » Application Security
  • » Data Protection
Channels:
News:
Events:
Tags:
  • Data Protection
  • DevSecOps
  • Compliance & Governance
  • Technical Deep Dive
  • Customer Story
  • Best Practices
  • Secret Management
  • HashiCorp Vault
  • Databricks Security
  • Data Pipeline Governance
  • DevSecOps Automation
  • Secret Scanning
  • JWT Authentication
Show more Show less

Browse videos

  • Related
  • Featured
  • By date
  • Most viewed
  • Top rated
  •  

              Video's comments: Implementing Vault for Databricks Secret Management

              Upcoming Webinar Calendar

              • 05/12/2026
                11:30 PM
                05/12/2026
                Implementing Effective Strategies for Active Directory Security and Data Protection
                https://www.truthinit.com/index.php/channel/1888/implementing-effective-strategies-for-active-directory-security-and-data-protection/
              • 05/13/2026
                01:00 AM
                05/13/2026
                Transforming the Black Box: Reveal Hidden Threats and AI Risks through Data Lineage
                https://www.truthinit.com/index.php/channel/1890/transforming-the-black-box-reveal-hidden-threats-and-ai-risks-through-data-lineage/
              • 05/13/2026
                05:00 AM
                05/13/2026
                Transforming the Black Box: Revealing AI Risks and Hidden Threats through Data Lineage
                https://www.truthinit.com/index.php/channel/1894/transforming-the-black-box-revealing-ai-risks-and-hidden-threats-through-data-lineage/
              • 05/19/2026
                01:00 PM
                05/19/2026
                Establishing a Robust AI Governance Framework for GenAI Throughout Deployment Phases
                https://www.truthinit.com/index.php/channel/1936/establishing-a-robust-ai-governance-framework-for-genai-throughout-deployment-phases/
              • 05/20/2026
                08:00 AM
                05/20/2026
                Establishing a Robust AI Governance Framework for GenAI Throughout Its Lifecycle
                https://www.truthinit.com/index.php/channel/1937/establishing-a-robust-ai-governance-framework-for-genai-throughout-its-lifecycle/
              • 05/20/2026
                10:00 PM
                05/20/2026
                Establishing a Robust AI Governance Framework for GenAI Throughout Its Lifecycle
                https://www.truthinit.com/index.php/channel/1953/establishing-a-robust-ai-governance-framework-for-genai-throughout-its-lifecycle/
              • 05/21/2026
                11:00 AM
                05/21/2026
                The Autonomous Era: Orchestrating a Resilient Enterprise
                https://www.truthinit.com/index.php/channel/1372/the-autonomous-era-orchestrating-a-resilient-enterprise/
              • 05/27/2026
                04:00 AM
                05/27/2026
                Rivoluziona i rischi dell'AI in opportunità con Netskope AI Security
                https://www.truthinit.com/index.php/channel/1925/rivoluziona-i-rischi-dellai-in-opportunità-con-netskope-ai-security/
              • 05/27/2026
                10:00 AM
                05/27/2026
                Harnessing AI: Transitioning from Illusion to Purposeful Mastery
                https://www.truthinit.com/index.php/channel/1924/harnessing-ai-transitioning-from-illusion-to-purposeful-mastery/
              • 05/28/2026
                01:00 PM
                05/28/2026
                Harnessing AI for Smaller Teams: Strategies for Secure Implementation
                https://www.truthinit.com/index.php/channel/1951/harnessing-ai-for-smaller-teams-strategies-for-secure-implementation/
              • 06/02/2026
                01:00 PM
                06/02/2026
                Spring of Satori: Delving into Recent Findings and the 2026 Threat Landscape
                https://www.truthinit.com/index.php/channel/1930/spring-of-satori-delving-into-recent-findings-and-the-2026-threat-landscape/
              • 06/04/2026
                02:00 AM
                06/04/2026
                Mastering the Unseen: Managing Shadow AI and Agentic MCP Traffic
                https://www.truthinit.com/index.php/channel/1948/mastering-the-unseen-managing-shadow-ai-and-agentic-mcp-traffic/
              • 06/16/2026
                07:00 AM
                06/16/2026
                Transforming Data Risk into Actionable Priorities: Essential Fixes First
                https://www.truthinit.com/index.php/channel/1952/transforming-data-risk-into-actionable-priorities-essential-fixes-first/

              Upcoming Events

              • May
                12

                Implementing Effective Strategies for Active Directory Security and Data Protection

                05/12/202611:30 PM ET
                • May
                  13

                  Transforming the Black Box: Reveal Hidden Threats and AI Risks through Data Lineage

                  05/13/202601:00 AM ET
                  • May
                    13

                    Transforming the Black Box: Revealing AI Risks and Hidden Threats through Data Lineage

                    05/13/202605:00 AM ET
                    • May
                      19

                      Establishing a Robust AI Governance Framework for GenAI Throughout Deployment Phases

                      05/19/202601:00 PM ET
                      • May
                        20

                        Establishing a Robust AI Governance Framework for GenAI Throughout Its Lifecycle

                        05/20/202608:00 AM ET
                        More events
                        Truth in IT
                        • Sponsor
                        • About Us
                        • Terms of Service
                        • Privacy Policy
                        • Contact Us
                        • Preference Management
                        Desktop version
                        Standard version