Complexity Assessment and Cognitive Load Metrics
Software complexity is the silent enemy of maintainability. AI-generated code tends to be more complex than necessary, with long functions, deep nesting and excessive coupling between components. Measuring and controlling this complexity is essential for maintaining a healthy codebase in the long term.
In this article we will explore advanced complexity metrics, from cognitive load to Halstead metrics, from architecture fitness functions to practical tools for evaluating and reducing the complexity of AI-generated code.
What You Will Learn
- Cognitive complexity and why it is more relevant than cyclomatic complexity for AI code
- Halstead metrics: volume, difficulty and computational effort
- Architecture fitness functions to evaluate AI's architectural impact
- AI-specific tendencies in generating complex code
- Refactoring strategies guided by complexity metrics
- How to set thresholds for AI code complexity
Cognitive Complexity: Measuring Comprehension Difficulty
Cognitive complexity measures how difficult code is to understand for a human being. Unlike cyclomatic complexity which counts execution paths, cognitive complexity evaluates the mental effort needed to follow the code logic, particularly penalizing deep nesting and breaks in linear flow.
For AI-generated code, cognitive complexity is a particularly relevant metric because AI produces code that often appears logically correct but is unnecessarily complex to read and maintain, with nesting levels that an experienced developer would not produce.
# Cognitive complexity calculation
class CognitiveComplexityCalculator:
"""Calculates cognitive complexity following the SonarSource model"""
def __init__(self):
self.complexity = 0
self.nesting_level = 0
def calculate(self, ast_node):
"""Calculates the cognitive complexity of a function"""
self.complexity = 0
self.nesting_level = 0
self._visit(ast_node)
return self.complexity
def _visit(self, node):
"""Recursive AST visit with incremental calculation"""
import ast
# Structural increments (B1): +1 for break in linear flow
if isinstance(node, (ast.If, ast.For, ast.While)):
self.complexity += 1 # base increment
self.complexity += self.nesting_level # nesting increment
self.nesting_level += 1
for child in ast.iter_child_nodes(node):
self._visit(child)
self.nesting_level -= 1
return
# Increments for compound logical operators
if isinstance(node, ast.BoolOp):
# Sequences of and/or count +1 per operator switch
self.complexity += 1
# Increments for flow breaks: else, elif, except
if isinstance(node, ast.ExceptHandler):
self.complexity += 1
self.complexity += self.nesting_level
# Recursion and goto (in languages that support them): +1
# Nested lambdas: +1 + nesting
if isinstance(node, ast.Lambda):
self.complexity += 1 + self.nesting_level
for child in ast.iter_child_nodes(node):
self._visit(child)
# Example of AI code with high cognitive complexity
# vs refactored version
# HIGH: Cognitive Complexity = 21
def process_order_ai(order):
if order: # +1
if order.status == "pending": # +2 (nesting=1)
for item in order.items: # +3 (nesting=2)
if item.quantity > 0: # +4 (nesting=3)
if item.in_stock: # +5 (nesting=4)
try: #
charge(item) #
except PaymentError: # +6 (nesting=4)
if item.retry_count < 3: # +7 (nesting=5)
retry(item)
else:
cancel(item)
return None
# LOW: Cognitive Complexity = 7
def process_order_refactored(order):
if not order or order.status != "pending": # +1
return None
for item in order.items: # +1
process_single_item(item) # extracted function
def process_single_item(item):
if item.quantity <= 0 or not item.in_stock: # +1 (+1 operator)
return
try_charge_item(item) # extracted function
def try_charge_item(item):
try:
charge(item)
except PaymentError: # +1
if item.retry_count < 3: # +2 (nesting=1)
retry(item)
else: # +1
cancel(item)
Halstead Metrics: Volume, Difficulty and Effort
Halstead metrics, developed by Maurice Halstead in 1977, provide a quantitative measure of software complexity based on counting operators and operands in source code. For AI-generated code, these metrics reveal interesting patterns: AI tends to produce code with a high number of unique operators but limited operand variety.
import math
class HalsteadMetrics:
"""Calculates Halstead metrics for source code"""
def __init__(self, operators, operands):
# n1 = number of distinct operators
# n2 = number of distinct operands
# N1 = total number of operators
# N2 = total number of operands
self.n1 = len(set(operators))
self.n2 = len(set(operands))
self.N1 = len(operators)
self.N2 = len(operands)
def vocabulary(self):
"""n = n1 + n2: program vocabulary"""
return self.n1 + self.n2
def length(self):
"""N = N1 + N2: program length"""
return self.N1 + self.N2
def volume(self):
"""V = N * log2(n): program volume"""
n = self.vocabulary()
N = self.length()
return N * math.log2(n) if n > 0 else 0
def difficulty(self):
"""D = (n1/2) * (N2/n2): program difficulty"""
if self.n2 == 0:
return 0
return (self.n1 / 2) * (self.N2 / self.n2)
def effort(self):
"""E = D * V: implementation effort"""
return self.difficulty() * self.volume()
def time_to_program(self):
"""T = E / 18: estimated time in seconds (Stroud number)"""
return self.effort() / 18
def bugs_estimate(self):
"""B = V / 3000: bug estimate (empirical metric)"""
return self.volume() / 3000
def summary(self):
"""Complete metrics summary"""
return {
"vocabulary": self.vocabulary(),
"length": self.length(),
"volume": round(self.volume(), 2),
"difficulty": round(self.difficulty(), 2),
"effort": round(self.effort(), 2),
"time_seconds": round(self.time_to_program(), 2),
"estimated_bugs": round(self.bugs_estimate(), 3)
}
Halstead Metrics: Thresholds for AI Code
| Metric | Good | Acceptable | Critical |
|---|---|---|---|
| Volume (V) | <100 | 100-1000 | >1000 |
| Difficulty (D) | <10 | 10-30 | >30 |
| Effort (E) | <1000 | 1000-10000 | >10000 |
| Estimated Bugs (B) | <0.1 | 0.1-0.5 | >0.5 |
Architecture Fitness Functions
Architecture fitness functions are metrics that evaluate how well the code respects the architectural constraints defined by the team. For AI-generated code, these functions are critical because AI does not know the project architecture and could introduce structural violations such as circular dependencies, excessive coupling or architectural layer violations.
# Architecture Fitness Functions for AI code
class ArchitectureFitness:
"""Evaluates architectural adherence of AI code"""
def __init__(self, project_structure, architecture_rules):
self.structure = project_structure
self.rules = architecture_rules
def evaluate_all(self):
"""Executes all fitness functions"""
return {
"coupling": self._check_coupling(),
"cohesion": self._check_cohesion(),
"layer_violations": self._check_layer_violations(),
"circular_dependencies": self._check_circular_deps(),
"component_size": self._check_component_size(),
"overall_fitness": self._calculate_overall_score()
}
def _check_coupling(self):
"""Measures coupling between modules"""
# Afferent coupling (Ca): who depends on me
# Efferent coupling (Ce): who I depend on
# Instability = Ce / (Ca + Ce)
results = []
for module in self.structure.modules:
ca = len(module.dependents)
ce = len(module.dependencies)
instability = ce / (ca + ce) if (ca + ce) > 0 else 0
results.append({
"module": module.name,
"afferent": ca,
"efferent": ce,
"instability": round(instability, 2),
"status": "OK" if instability < 0.7 else "WARNING"
})
return results
def _check_layer_violations(self):
"""Verifies that architectural layers are respected"""
# E.g.: presentation must not import from data layer
violations = []
for rule in self.rules.get("layer_rules", []):
source = rule["from_layer"]
forbidden = rule["cannot_import"]
imports = self.structure.get_imports(source)
for imp in imports:
if any(f in imp for f in forbidden):
violations.append({
"from": source,
"imports": imp,
"forbidden_layer": forbidden,
"severity": "HIGH"
})
return violations
AI Tendencies in Generating Complex Code
Analysis of thousands of AI outputs reveals recurring patterns of unnecessary complexity. Understanding these tendencies helps teams configure the most appropriate controls and better guide prompts to obtain simpler code.
- Over-engineering: AI adds patterns (factory, strategy, observer) where a simple solution would suffice
- Deep nesting: AI frequently generates 4-5 nesting levels where 1-2 would be sufficient
- God functions: overly long functions that do too many things, with hundreds of lines
- Premature abstraction: interfaces and abstract classes for code that does not need extensibility
- Copy-paste evolution: AI duplicates and slightly modifies rather than generalizing
Recommended Complexity Thresholds for AI Code
- Cognitive complexity per function: max 15 (vs 25 for human code)
- Maximum nesting depth: 3 levels (vs 4 for human code)
- Lines per function: max 40 (vs 60 for human code)
- Parameters per function: max 4 (vs 5 for human code)
- Dependencies per module: max 8 external imports (vs 10 for human code)
- Halstead Difficulty: max 25 (signal of overly dense code)
Metric-Guided Refactoring Strategies
Complexity metrics serve not only to block code but also to guide refactoring. Each high metric points to a specific simplification strategy that can be systematically applied.
Metric-to-Refactoring Map
| Problem Detected | Metric | Refactoring Strategy |
|---|---|---|
| Deep nesting | High cognitive complexity | Early return, guard clauses, extract method |
| Function too long | High Halstead volume | Extract method, single responsibility |
| Too many parameters | High Halstead difficulty | Parameter object, builder pattern |
| Excessive dependencies | High efferent coupling | Dependency injection, interface segregation |
| Class too large | LOC + instability | Extract class, decompose by responsibility |
Monitoring Complexity Over Time
Codebase complexity must be monitored as a trend over time, not as a single point measurement. A healthy codebase maintains stable or decreasing complexity. A constant increase in complexity, especially correlated with AI coding tools adoption, is a warning signal requiring immediate intervention.
Conclusions
Complexity is the most insidious enemy of software quality, and AI-generated code tends to amplify it. Cognitive complexity, Halstead metrics and architecture fitness functions provide the tools to measure, monitor and control this complexity systematically.
In the next article we will address productivity metrics: how to measure AI's impact on development speed, the productivity paradox and the delicate balance between speed and quality in the context of AI-assisted development.
Simplicity is the ultimate sophistication. And for AI-generated code, achieving simplicity requires metrics, discipline and constant refactoring.







