I Built a Personal Developer Gem — Here's What It Actually Does

Keywords: developer tools, AI for developers, Gemini Gem, Python scripting, code review, infrastructure automation, peer programming

Reading time: ~8 minutes


My primary work is network infrastructure, not software development. But I write scripts regularly — automation, monitoring tools, configuration management. I'm competent in Python and shell scripting, less so in everything else. When I run into a problem outside my comfort zone, I want an answer that fits my context, not a tutorial written for a junior developer starting from zero.

My Developer Gem knows that about me. It's calibrated to a network engineer who scripts as a secondary skill, working in a specific environment, with specific tooling and conventions. I don't have to re-explain my stack every time I ask a question.

The Problem I Was Trying to Solve

The pattern that pushed me to build this:

First, context mismatches. I'd ask a generic AI a Python question and get an answer that assumed a completely different environment than mine — a web framework I don't use, a database I don't have, a packaging system I've never touched. The answer was technically correct but the practical setup cost was 80% of the work, and most of that work didn't apply to my actual problem.

Second, expertise calibration issues. Some questions needed a beginner-level explanation; others needed a senior-level answer. Generic AI defaulted to a middle level that was rarely what I wanted. For "how do I parse a YAML file in Python" I want a senior answer — give me the standard library call, don't walk me through what YAML is. For "how do I make this script run as a service on this specific platform" I might need more hand-holding because that's not something I do often.

Third, style mismatch. The code I write has conventions — error handling patterns, logging style, function structure. Generic AI would produce code that worked but didn't match my style, which meant rewriting it anyway. If the conventions were loaded into the Gem upfront, the output would be closer to what I'd write.

I wanted a tool that already knew my context — my stack, my skill level, my style — and could give me direct, calibrated answers.

What I Tried First (and Why It Wasn't Enough)

My first approach was keeping a personal coding cheatsheet — common patterns, library calls I forget, environment-specific quirks. Useful for routine work but didn't help when I needed to think through a novel problem.

Second was using a generic AI with a long prompt explaining my role and environment. This worked but the context had to be re-pasted each session, and the AI would sometimes give me code that worked but required significant rework to match my style.

Third was asking a developer colleague. Most effective for novel problems but most expensive in time — colleagues are busy, and asking for help on a simple scripting task felt wasteful when I just needed a quick syntax reminder or pattern reference.

What I wanted was a peer-level assistant for network engineers who script — direct answers, calibrated to my actual context, with code that fits my style.

The Developer Gem did exactly that.

The Gem I Use

Here's the core of the instructions I saved into the Gem:

You are my peer-level developer assistant. I am a network infrastructure engineer who writes scripts as a secondary skill. Primary languages: Python 3, Bash. Secondary: occasional Go, occasional PowerShell.

When I ask a development question:

  • Assume I know programming fundamentals — skip explanations of variables, loops, functions
  • Calibrate to my actual environment: Linux (Debian/Ubuntu primary), macOS secondary, occasional Windows
  • For Python questions: use the standard library when possible; reach for third-party packages only when they significantly simplify the task
  • For Bash questions: assume Bash 4+; avoid Bash-isms that break on other shells unless I ask for them
  • Include error handling in code examples unless I specifically ask for a quick prototype
  • Match my style: type hints in Python, structured logging, clear function names, minimal comments
  • When I'm uncertain about a library or platform behavior, say so rather than guessing

My context: scripts typically run in cron, systemd, or scheduled tasks. They read configs, talk to APIs, process logs, and generate reports. Production reliability matters more than cleverness.

The "production reliability matters more than cleverness" line in the context paragraph turned out to matter more than I expected. It pushed the Gem away from clever one-liners toward boring, reliable code — which is what I actually want for production scripts.

Two Real Examples — Before and After

These are based on actual scripts I've written. System details are changed but the patterns are real.

Example 1: Log Parsing Script

Situation. I needed to parse a vendor's log format that wasn't standard — fields were space-separated but the message field could contain spaces. Generic AI would default to regex with greedy matching, which would break on multi-word messages.

What I asked the Gem:

"Parse this log format: timestamp level component message. Fields are space-separated but 'message' can contain spaces. Last field. Write a Python function."

Gem's output:

from datetime import datetime
import logging
from typing import Iterator

logger = logging.getLogger(__name__)

def parse_log_line(line: str) -> dict | None:
    """Parse a single log line. Returns None if malformed."""
    line = line.rstrip("\n")
    if not line:
        return None

    parts = line.split(" ", 3)
    if len(parts) < 4:
        logger.warning("Malformed log line: %r", line)
        return None

    return {
        "timestamp": parts[0],
        "level": parts[1],
        "component": parts[2],
        "message": parts[3],
    }

def parse_log_file(path: str) -> Iterator[dict]:
    """Yield parsed entries from a log file."""
    with open(path) as f:
        for line in f:
            entry = parse_log_line(line)
            if entry is not None:
                yield entry

What the Gem got right. The "split with maxsplit=3" technique is exactly right for last-field-can-contain-spaces. The type hints match my style. The error handling returns None and logs — production-friendly. The two-function split (parse line vs. parse file) is how I'd structure it. The code was usable as-is with minor variable renaming.

What I changed before using it. About 5% — I added a counter for malformed lines, changed the timestamp parsing to actually parse the timestamp into a datetime object. The core parsing logic was the Gem's.

Example 2: API Polling Script with Error Handling

Situation. I needed a script that polled a vendor API every 5 minutes and alerted on certain conditions. The script would run as a systemd service. Reliability was critical — silent failure was unacceptable.

What I asked the Gem:

"Write a Python script that polls a vendor API every 5 minutes, checks a condition, and alerts on match. Will run as systemd service. Reliability matters — silent failure is unacceptable."

Gem's output (excerpt):

Key reliability considerations for a polling service like this:

  1. Distinguish between "API is down" (don't alert on the condition, alert on the data being stale) and "condition is met" (alert as designed).
  2. Don't crash on transient network errors — back off and retry.
  3. Log every poll attempt and outcome — silent failures are debugging nightmares.
  4. Track consecutive failures — alert if more than 3 in a row.

Suggested structure:

import time
import logging
import requests
from requests.exceptions import RequestException

POLL_INTERVAL = 300 # 5 minutes
MAX_CONSECUTIVE_FAILURES = 3

def check_condition(data: dict) -> bool:
    return data.get("status") == "alert"

def alert(data: dict) -> None:
    requests.post(ALERT_WEBHOOK, json=data, timeout=10)

def main():
    consecutive_failures = 0
    while True:
        try:
            data = requests.get(API_URL, timeout=10).json()
            consecutive_failures = 0
            if check_condition(data):
                alert(data)
        except RequestException as e:
            consecutive_failures += 1
            logger.exception("Poll failed (%d consecutive)", consecutive_failures)
            if consecutive_failures >= MAX_CONSECUTIVE_FAILURES:
                alert({"alert": "polling_failing", "count": consecutive_failures})
        time.sleep(POLL_INTERVAL)

What the Gem got right. The reliability considerations at the top were the actual points I'd think through. The structure distinguishes between "condition met" and "polling failing" — a common bug in scripts like this is alerting on the wrong condition. The consecutive failure counter is exactly the pattern I'd use. The code is production-ready with minor additions.

What I changed before deploying. About 15% — I added signal handlers for clean shutdown, added a startup check to verify the alert webhook is reachable, and adjusted the retry/backoff strategy. The core polling loop and the failure handling were the Gem's.

Where This Works and Where It Doesn't

After about 18 months of use, here's an honest assessment.

It works well for:

  • Standard library and common third-party patterns. requests, json, logging, pathlib, subprocess, etc.
  • Code that follows my style conventions. The output needs minimal reformatting.
  • Patterns I've used before but need a reminder on. The Gem gives me the right call without the tutorial.
  • Reliability patterns. Error handling, retries, logging — the kind of code that doesn't differ much between projects.

It doesn't work well for:

  • Cutting-edge libraries or version-specific features. The Gem's training data may not have the current API.
  • Architecture decisions for larger applications. Script-level work is where I use this; for a real application, I'd want a different conversation.
  • Performance optimization beyond basic patterns. Profiling and optimization requires actual measurement on the actual system.
  • Code that requires domain-specific libraries I haven't told it about. Without context, the Gem defaults to standard choices.

A Note on Code Review

The Gem doesn't replace code review — either self-review or peer review. For production scripts, I always read the generated code line by line before deploying, especially the error handling and the edge cases. The Gem gets me 80% of the way there; the last 20% is judgment about my specific environment.

For scripts where the failure mode matters (anything running unattended, anything that touches production, anything that touches customer data), I add extra review: trace through the code mentally with various failure inputs, run it in a test environment first, and start with a dry-run mode if possible.

The Gem accelerates the draft. It doesn't replace the verification. For infrastructure work where reliability matters, that's the right balance.


Related Reading

Sources

— Justin

📅 First published: 2026-04-29 | 🔄 Last updated: 2026-06-21