In the engineering team’s previous post, we discussed how to think about AI as a tool. This naturally leads to a more operational question: how do you integrate that tool into real engineering workflows?
After some time working with agentic tools, one pattern becomes hard to ignore. The problem is rarely just the prompt. The same setup steps, build commands, and coding expectations get repeated again and again.
At that point, it stops being a prompting issue. It becomes a structure problem. Those repeated instructions should not live in chat. They should live in the repository.
A surprising amount of agent failure has very little to do with reasoning. It’s operational.
Environment setup, runtime configuration, and build execution are some of the most common sources of instability. If those steps are implicit, the agent will guess — and those guesses are often wrong.
You also start noticing something else: the same instructions keep coming back.
If you repeat something more than a few times, it probably shouldn’t stay as a prompt. It should become a reusable skill. Making these steps explicit tends to improve reliability immediately.
For example, instead of repeatedly explaining how to switch Java versions or run the build, you can capture it as a reusable skill:
--- name: java-runtime description: Set or switch Java version for development and agent execution. --- # Java Runtime Setup When setting or switching Java version: ```bash source "$HOME/.sdkman/bin/sdkman-init.sh" && sdk use java <version> To list available versions: ``` source "$HOME/.sdkman/bin/sdkman-init.sh" && sdk list java ``` Both commands must run in the same shell invocation.
This might look simple, but it encodes details that are easy to miss in a prompt — like the requirement to run both commands in the same shell session.
Once these instructions are part of the repository, the agent no longer needs to infer how to operate the project. It follows a defined path.
In practice, this is where “agentifying” the repository starts to pay off. The more recurring operational steps you make explicit, the more stable the agent’s behavior becomes.
Operational skills make the agent usable. Rules make the output predictable.
Without explicit coding expectations, agents tend to drift. The code may compile, but it won’t consistently match how the team thinks about structure, safety, or design. That creates extra review overhead and, over time, inconsistency in the codebase.
Operationally, rules work best when they capture decisions that are already stable within the team — things you would otherwise repeat during code reviews.
For example, DTO modeling is often a source of subtle inconsistency:
# DTO Modeling Use when designing API contracts or transport objects. - Prefer immutable `record` DTOs - Keep DTOs separate from domain models - Do not serialize domain objects directly - Avoid weakly typed identifiers (`String`, `UUID`, etc.) - Use explicit value types where meaning matters - Validate invariants close to construction
This kind of rule does more than guide generation. It encodes design intent: separation of concerns, type safety, and predictable data flow. Without it, the agent will often default to simpler but less maintainable patterns — for example, using generic Map<String, Object> structures for JSON handling or introducing manual JSON marshalling logic. These approaches might work initially, but they quickly lead to harder-to-maintain code and additional overhead as the system evolves.
Another example is null handling:
# Null Safety Use when designing method signatures and handling input. - Never return `null` - Never accept `null` as valid input - Model absence explicitly - Fail fast on unexpected `null` - Never return `null` collections
These rules reduce ambiguity. Instead of guessing how to handle missing data, the agent follows a consistent model. That makes the generated code easier to reason about and less error-prone in edge cases.
In our experience across multiple services, adding rules and skills to the repository made a clear difference. The biggest change was in code reviews: style-related comments, which used to take a lot of time, almost disappeared. This made reviews much faster, with fewer back-and-forth changes. Agents also needed fewer follow-up prompts, since the expectations were already defined in the repo, which sped up work by roughly 30–40%. The main benefit wasn’t just faster code, but less rework and more consistent output from the start.
These are just a few examples. Teams typically accumulate many focused rules over time: streams, reactive patterns, functional style, and framework-specific conventions. The goal is not to define everything upfront, but to capture the patterns that repeatedly matter.
Once skills and rules start to accumulate, another issue appears quickly: scope.
If everything lives in one place, agents get too much context. They may pick the wrong guidance, mix unrelated rules, or ignore parts entirely. This becomes more noticeable in multi-module or monorepo setups. It helps to structure guidance the same way the codebase is structured:
Start with root-level guidance — things that apply everywhere:
repo/ .ai/ rules/ grade-execution.mdc project-guidelines.mdc submodule-rules.mdc skills/ dto/SKILL.md null-safety/SKILL.md streams/SKILL.md
This layer defines the baseline:
Then narrow the scope at the module level:
module-a/ .ai/ rules/ guidelines.mdc skills/ design/SKILL.md testing/SKILL.md state-management/SKILL.md
Here, guidance becomes more specific:
This separation tends to reduce noise. When working inside a module, the agent is more likely to use relevant guidance instead of pulling unrelated rules from other parts of the repository.
The important part is not the exact layout, but the idea behind it:
In real work, this makes agent behavior more predictable. It increases the chance that the agent applies the right rules in the right place, instead of trying to combine everything at once.
Once you start using more than one agentic tool, another pattern becomes clear: they don’t consume structure in the same way.
On the surface, it’s tempting to standardize everything — a shared directory, a single format for skills and rules, one place for all agent guidance. This is especially appealing in teams where engineers use different IDEs and tools.
A common approach is to introduce a tool-agnostic layer like `agents.md`, which is supported (or can be integrated) across many popular agentic tools. As a shared entry point, this works well: it creates a consistent, discoverable place for guidance and reduces fragmentation across the team.
However, this approach has some practical limitations.
First, `agents.md` often requires explicit configuration to be included in an agent’s context. If a developer hasn’t enabled that setting, the file may be silently ignored. In day-to-day work, this leads to inconsistent behavior across contributors.
Second, it acts as a lowest-common-denominator format. It’s useful for general guidance, but it does not fully capture how different tools model skills, rules, or executable workflows. Most agents expect structure in specific locations and formats, and they tend to perform better when those expectations are met.
As a result, a purely generic abstraction can degrade in effectiveness as the repository grows. What looks like a shared system often behaves like partial or inconsistently applied context.
A more practical approach is to combine both layers:
They reduce duplication, improve portability, and provide a shared layer of guidance across tools.
Use this layer for common context, commands, and conventions that should apply everywhere.
When higher precision is needed, extend with agent-specific formats. Many tools support richer structures for rules or skills, and using them can improve reliability and output quality.
In practice, a layered approach works best: shared standards as the foundation, agent-specific structure where it adds value. Just make sure these shared files are explicitly configured in each tool — otherwise, they may be ignored.
By this point, the pattern is fairly consistent.
Repeated prompts turn into skills.
Coding expectations turn into rules.
Repository structure helps scope that guidance.
Tool-specific formats make it more reliable in practice.
All this improves how the agent executes tasks. It reduces guesswork and makes behavior more consistent across the codebase. But it does not solve the full problem.
Even with well-defined skills and rules, the agent can still produce code that compiles but is incorrect, incomplete, or subtly wrong. Structure improves execution — not evaluation.
We have updated our Privacy Notice. Please click here for details.