Don’t Forget Your Software Craftsmanship

von Florian Rudisch | 7. Mai 2025 | English, Künstliche Intelligenz, Software Engineering

Florian Rudisch

Lead Developer und Ansprechpartner DevCon

Testing as a Fundamental Aspect of Software Craftsmanship

Ensuring the reliability and maintainability of software systems requires rigorous testing, making it a cornerstone of every software project. Martin Fowler, a key advocate for software craftsmanship, emphasizes the importance of automated testing and continuous integration pipelines in maintaining software correctness and stability. These practices allow developers to release software with confidence, minimizing the risk of defects.

A crucial approach in the software development process is Test-Driven Development (TDD). TDD follows a structured cycle:

  1. writing one or more failing tests based on requirements
  2. implementing the minimal code necessary for the tests to pass
  3. refactoring to improve design and maintainability

This approach ensures that software meets requirements while promoting better design decisions and higher code quality.

By following Test-Driven Development, developers create robust, maintainable software. Software craftsmanship is not just about writing functional code—it’s about writing high-quality, reliable code that withstands the test of time.

LLMs and Their Unpredictable Behavior

Large Language Models (LLMs) are powerful AI tools that generate human-like text based on input prompts. LLMs are increasingly being integrated into applications. For example asking an agent or chatbot about your details or optimizations of your bank account will become commonly used. To achieve this, developers need to integrate LLMs into their applications. Often, the LLMs will appear as another subsystem, just like a database, from a developer’s perspective.

However, the probabilistic nature of LLMs make outputs variable and sometimes unpredictable. The same prompt can yield different responses across inference runs influenced by changes in contextual information, making consistency and reliability a challenge.

Several factors contribute to these inconsistencies:

  • Prompt Sensitivity: Small changes in wording or structure can lead to vastly different responses, requiring careful prompt engineering for stability.
  • Inference Parameters: Settings such as temperature and top-k sampling affect randomness in responses, impacting determinism and reproducibility.
  • Context Awareness: LLMs interpret prompts contextually, meaning prior interactions or additional information can significantly influence outputs.
  • Hallucination Risks: LLMs may generate plausible yet factually incorrect information, necessitating verification against reliable sources.

These factors make it difficult to achieve predictable results and create reliable test sets when integrating an LLM into an application. Developers using TDD dislike unpredictable behavior causing non-repeatable test cases. Especially in automated test runs when using Continuous Integration/Development principles, predictable and repeatable test cases are key.

Testing Principles to Address LLM Challenges

A structured testing approach is essential to mitigate LLM unpredictability and to minimize the risk of integrating an LLM into an application. As in traditional software development, Test-Driven Development (TDD) plays a critical role in developing LLM-based applications. The approach helps in establishing a ground truth before implementation and creates reliable tests.

  • Defining the Ground Truth: In the LLM context, TDD involves defining precise test cases with specified inputs and expected outputs.
  • Developing with Testing in Mind: Writing tests before implementation ensures that all system components align with expected behaviors, enhancing stability and predictability.
  • Leveraging Tools like Promptfoo: Tools such as Promptfoo, which automate prompt evaluations and output tracking, help ensure consistency over time.
  • Automating to Prevent Regressions: Integrating automated tests into continuous integration pipelines helps detect and prevent regressions, ensuring updates (e.g., using a newer LLM version) do not degrade system performance.

By embedding these principles into LLM integrated application development and deployment, engineers can create more reliable, testable, and maintainable AI-driven systems that consistently meet quality expectations.

Conclusion

Software development craftsmanship should extend to LLM integration and working with LLMs. Just as rigorous testing and structured development are essential for traditional software systems, they are equally critical for ensuring the reliability of AI-driven applications.

Additionally, AI can assist in defining test cases for both traditional software development and LLM integration. By leveraging AI-driven testing approaches, developers can enhance the accuracy and efficiency of their test cases.

Moreover, LLM integration testing can be further improved by using multiple LLMs to cross-check outputs for hallucinations or inaccuracies. This multi-model validation approach can help detect inconsistencies and enhance the robustness of AI-generated content.

Maintain your software craftsmanship when working with AI-based systems. TDD is just as important in working with AI systems as in traditional software development. Uphold your craftsmanship: start every LLM project with tests, structure, and focus on quality.