docs

Taxonomy authoring

Your harms taxonomy tells Backstop what to alert on. It is a small YAML document that Backstop turns into a system prompt for the vision-capable LLM.

Start with a preset

The parent web app ships with three presets:

  • Standard 6-8 — conservative rules for younger children.
  • Standard 8-12 — the default.
  • Standard 13-17 — more nuanced, treats a lot of content as “context, not alert.”

Pick one, then edit.

Structure

categories:
  - id: bullying
    label: Bullying or harassment
    severity_bucket: 1
    description: |
      Messages that appear to threaten, demean, or exclude a specific person
      by name or handle, sent to or received by my child.
    examples_positive:
      - 'message calling someone a slur'
      - 'coordinating exclusion in a group chat'
    examples_negative:
      - 'sarcastic banter between mutual friends'

  - id: self_harm
    label: Self-harm ideation
    severity_bucket: 1
    description: |
      Content that suggests my child or someone they're talking to is thinking
      about hurting themselves.

  - id: adult_content
    label: Sexual content
    severity_bucket: 1
    description: |
      Explicit sexual imagery or text on screen.

Fields

  • id — slug used in the alert record. Stable; don’t rename after alerts have fired.
  • label — the human-readable name that appears in your alerts.
  • severity_bucket0 for “notify eventually” or 1 for “notify now.” See below.
  • description — the rule. Write it as a plain-English sentence. This is what the LLM matches against.
  • examples_positive / examples_negative — optional. Anchor the LLM’s judgment on your context.

Severity buckets

Backstop deliberately supports only two buckets so the control plane can route alerts without seeing content:

  • 0 — delivered to your default channel (usually the parent web app), included in daily digests.
  • 1 — delivered immediately to your urgent channel (push, SMS).

Writing good descriptions

  • Be concrete about your family. “My child is 11 and plays Minecraft” is real context; the LLM will use it.
  • Use examples if the category is fuzzy. Two examples_positive and two examples_negative beat a paragraph of hedging.
  • Prefer inclusion over exclusion. “Flag anything that looks like X” works better than “flag everything except Y.”

Testing

The parent app has a Taxonomy tester. Paste text or upload a sample screenshot, and it runs your current taxonomy locally through your BYOK LLM. Iterate here before pushing to endpoints.

Publishing

Save changes in the parent app. The app encrypts the taxonomy under your family key and sends the ciphertext to the control plane, which relays it to your enrolled endpoints on their next config sync (within a few minutes).