loader

Tagging is Metadata’s Metadata: Why Good Tags Matter

What Are Tags, Really?

Tags are often dismissed as technical conveniences — tiny labels stuck to columns or datasets like digital post-it notes. But to dismiss them is to miss out on their true purpose: flexibility.

In essence, a tag is a name - a declaration of identity or purpose. To tag something is to say: this matters in this way. It’s a semantic fingerprint, applied to a data object in a way that transforms ambiguity into meaning, and meaning into action.

Much like how language gives us categories for thought, tags give data structure for use.

In a data context, tags are lightweight labels applied to objects in a data estate – like data products, tables, columns, dashboards, jobs, or infrastructure.

They come in two varieties:

  • Simple (Flat Labels)
    "tags": ["pii", "curated", "dataops@company.com"]
  • Key-Value (Structured Tags)
    "tags": {
      "sensitivity": "pii",
      "lifecycle": "curated",
      "owner": "dataops@company.com"
    }

Unlike strict schemas, tags are fast to apply, easy to ready, and adaptable. They don’t just describe data, they help organize, govern, and automate it.

Why Tag at All?

In a modern data platform, tags serve as the connective tissue, bridging data discovery, governance, quality, automation, and value attribution. Tags help us:

See Clearly

Without tags, datasets blur together into undifferentiated sprawl. Tags make visible what is valuable, sensitive, ready, or raw.

Control Responsibly

In a world of rising regulatory and ethical expectations, a tag like pii is not mere decoration it is a boundary in metadata that systems and stewards must not cross.

Automate Intelligently

Tags give systems something to act upon. If data is tagged deprecated, a pipeline can skip it. If it’s gold, it can be promoted. If it's experimental, it can be flagged for review. The tag becomes an instruction to the machine.

What Makes a Good Tag?

Here’s where philosophy meets practice. A good tag, like a good concept, should be:

  • Human-readable: it should be instantly understood
  • Standardised: Consistent naming with pre-approved values
  • Scoped: Applies to the right-level – column, table, job, etc.,
  • Purposeful: Every tag should exist for a reason, no vanity tags
  • Governed: Should be validated and, where needed, curated centrally
  • Composable: Combine tags like filters
  • Machine-friendly: should integrate with systems for automation and filtering

On the Metadata of Metadata: Why Tag Provenance Matters

Tags are metadata about your data, but we often forget - tags themselves have metadata. If you don't track where a tag came from, who owns it, and how it's supposed to behave, you risk chaos.

We must ask:

  • Who authored this tag?
  • Why was it assigned?
  • Is it still valid?
  • Should it flow downstream to all derivative data, or stay where it was born?

This is where we move from tagging to meta-tagging — from the act of naming to the governance of names.

Tags Should Be Treated Like Data Themselves

Each tag should carry:

  • System of Record: Where it originated (e.g., Databricks, DataHub, Collibra)
  • Scope: What the tag applies to (column, dataset, job, asset)
  • Immutability: Can it change? Should it ever?
  • Propagation Rule: Should it flow downstream through lineage? Or stay local?
  • Purpose: Governance? Discovery? Automation?

This may appear like overhead, but it's not - it is ontological hygiene. It ensures that tags do not become corrupted abstractions but remain faithful representations of truth in context.

To tag well is to impose just enough structure to liberate potential without choking possibility.

Tag with Integrity

As data ecosystems grow more autonomous (with data products, self-service platforms, and AI-driven pipelines) tags become the language of decision-making. They are the way we imbue digital systems with ethics, context, and memory.

If we get tagging wrong, we lose trust, transparency, and control.

If we get tagging right, we gain insight, action, and meaning. So:

  • Treat tags as first-class metadata.
  • Define who owns each tag, where it flows, and what it means.
  • Build tools that validate, propagate, and preserve them.

And never forget: to tag something well is to understand it with precision and care. In the end, every good tag is a question answered; and every bad tag is a question deferred.

Ready to build your tagging framework?

This post covers the why, but the how is just as critical. Contact us to understand how we can help create a tagging system that drives clarity, governance, and automation.

Let our team be your guide.

author profile

Author

Ust Oldfield