Agentic AI in 5G RAN

Venkateshu Kamarthi
Dec 19, 2025
10 min read

1. Introduction

Modern 5G radio access networks have reached a level of complexity where traditional automation approaches are no longer sufficient. While machine learning has been widely adopted in telecom analytics, most deployed solutions still behave as passive systems: they ingest data, run inference, and output a score, label, or alert.

In practice, experienced RAN engineers do not work this way. They observe symptoms, form hypotheses, validate those hypotheses by checking additional signals, discard weak explanations, and iterate until they reach a conclusion they can defend. Agentic AI attempts to encode this troubleshooting behaviour into software. Rather than producing a single-shot prediction, an agent actively reasons over data, decides what information to inspect next, and converges toward a root cause with supporting evidence.

For example, A single user throughput issue may involve interactions between PHY-layer radio conditions, MAC scheduling behavior, mobility procedures, transport congestion, and configuration changes that occurred hours earlier.

What Makes AI “Agentic”

Agentic AI is not defined by a specific algorithm or model. Instead, it is defined by behaviour. An agent is an entity that operates in an environment with a goal, maintains internal state, and can take actions that influence its future observations. In the context of software systems, those actions usually involve querying data sources, invoking analytical tools, or triggering downstream workflows.

Agentic systems decompose intelligence into a cognitive loop: Observe (perceive via sensors/logs), Orient (contextualize with memory/knowledge), Decide (reason/plan via LLM), Act (execute tools/APIs), Reflect (evaluate/critique outcomes). Unlike traditional ML (supervised classification), agents handle open-ended tasks through ReAct prompting ("Reason + Act") or hierarchical planning (sub-agents for subtasks).

The critical distinction between an agent and a conventional ML model is control flow. In a traditional ML pipeline, the sequence of operations is fixed by the developer. In an agentic system, the AI itself decides the sequence. For example, when analysing a throughput drop, an agent might first look at SINR trends. If radio conditions appear healthy, it may then choose to inspect MAC retransmissions or scheduler fairness. If those signals are inconclusive, it might expand the scope to neighbouring cells or recent configuration changes.

This ability to decide “what to check next” is what gives agentic systems their power in complex domains like telecom.

Traditional machine learning systems are fundamentally reactive. They take an input vector, apply a learned function, and produce an output. Once the output is produced, the system is done. Any further investigation or decision-making must be orchestrated externally by humans or additional pipelines.

Agentic AI systems, by contrast, are interactive and goal-driven. They do not stop after producing a single output. Instead, they observe the environment, reason about what they are seeing, decide what information they need next, and continue operating until they reach a satisfactory conclusion.

In other words, traditional ML answers a question.

Agentic AI figures out what the right question is and then answers it.

Dimension	Traditional ML	Agentic AI
Execution style	One-shot inference	Iterative reasoning loop
Control flow	Developer-defined	Agent-defined
Adaptability	Low	High
Memory	None or implicit (model weights)	Explicit short & long-term memory
Tool usage	Hard-coded	Dynamic, agent-selected
Explainability	Post-hoc, limited	Native, reasoning-based
Human trust	Often low	Significantly higher
Best suited for	Well-defined tasks	Complex, ambiguous problems

2. Core Architecture of an Agentic AI System

At a high level, an agentic AI system for 5G RAN sits between the network and the operations layer. It continuously observes telemetry from the network, reasons over that data using domain knowledge and learned patterns and produces structured conclusions.

The first component is the perception layer. This layer ingests raw telemetry such as gNB logs, performance management counters, event notifications, and traces. Its role is not to make decisions but to normalize and contextualize data. A log line indicating repeated UL HARQ failures is transformed into a structured signal associated with a specific UE, cell, time window, and protocol layer.
Once data is structured, it becomes input to the agent core. The agent core maintains the current investigative context: what problem is being analysed, what evidence has already been gathered, and which hypotheses are still plausible. This context persists across multiple reasoning steps, which is essential for complex RCA.
The agent then interacts with tools. In a telecom environment, tools might include KPI query services, topology graphs, historical data stores, or even simulators. Importantly, the agent decides when and how to use these tools. This is a major departure from static analytics pipelines.
Finally, the agent produces an output that includes both a conclusion and a narrative explanation. The explanation is not an afterthought; it is a first-class output that determines whether the system will be trusted by network engineers.

3. Reasoning and Memory in Telecom Agents

Memory plays a crucial role in agentic AI. Without memory, an agent cannot build on previous observations or learn from past incidents. In RAN analytics, memory typically takes three forms.

Short-term memory holds the current investigative state. For example, it may store that SINR has already been checked and found to be stable, so there is no need to revisit that hypothesis.
Long-term memory stores known failure patterns, such as the observation that high BLER combined with high SINR often points to scheduling issues rather than radio problems.
Episodic memory captures past incidents and their confirmed root causes, allowing the agent to recognize recurring patterns.

Reasoning operates over this memory using a combination of rule-based logic and learned representations. Domain rules are particularly important in telecom because they constrain the solution space. For instance, if downlink BLER is high but uplink BLER is normal, certain RF issues can be ruled out early in the investigation.

4. Applying Agentic AI to 5G RAN Root Cause Analysis

To make these ideas concrete, consider a common operational problem: a sudden drop in throughput affecting a subset of users in a cell.
The agent begins by observing the symptom through PM counters. It notices that throughput has dropped by 40% compared to the baseline. Rather than immediately labelling the issue, the agent formulates initial hypotheses. These might include poor radio conditions, uplink interference, scheduler misconfiguration, or transport congestion.
The agent then evaluates these hypotheses sequentially. It checks SINR and RSRP trends and finds them to be stable. This weakens the radio degradation hypothesis. Next, it inspects BLER and HARQ retransmission counts, both of which are elevated. At this point, the agent reasons that if SINR is good but BLER is high, the problem is unlikely to be purely physical-layer interference.
The agent then queries MAC-layer statistics and discovers that scheduler queue lengths increased sharply after a recent configuration change. It correlates the timing of the throughput drop with this change and checks whether neighbouring cells exhibit similar behavior. They do not, which further strengthens the hypothesis that the issue is local and configuration related.
At this stage, the agent has enough evidence to converge on a root cause: a scheduler parameter change leading to excessive retransmissions and reduced effective throughput. The final output includes not only this conclusion but also the reasoning chain and supporting metrics.

Problem Statement

Given gNB logs and KPIs, automatically identify the root cause of a throughput drop.

Step 1: Data Ingestion

Sources:

gNB log stream
PM counters (15-min / 5-min)
UE traces

Pipeline:

Kafka → Stream Processor → Feature Normalizer

Step 2: Define Agent Goal

Goal: Identify root cause of RAN performance degradation

Step 3: Define Hypothesis Space

Examples:

Interference
Scheduler overload
Hardware fault
Backhaul congestion
Mobility misconfiguration

Step 4: Reasoning Logic

Pseudo-code:

if SINR > threshold and BLER high:

suspect MAC or scheduler

elif SINR low and RSRP fluctuating:

suspect interference

elif handover failures high:

suspect mobility config

Step 5: Evidence Collection

Agent queries:

Neighbor cell KPIs
Scheduler stats
UE distribution
Time correlation

Step 6: Root Cause Identification

Example output:

Root Cause:

Scheduler misconfiguration causing excessive HARQ retransmissions

Evidence:

- SINR stable at 18–20 dB

- BLER > 25%

- Retransmission count > 3x baseline

- Issue started after config change

Step 7: Explanation Generation

Agent produces engineer-readable RCA, not just a label.

5. Building a Agentic AI Application

Building a minimal agentic system for 5G RAN does not require exotic infrastructure. The starting point is a streaming data pipeline that ingests logs and counters from the gNB. This pipeline can be built using standard components such as Kafka and a stream processing framework.

On top of this pipeline sits the agent. The agent’s goal is explicitly defined: identify the most likely root cause of observed performance degradation. A small hypothesis library is created, capturing common RAN failure modes. Each hypothesis is associated with signals that support or contradict it.

The agent operates in a loop. It observes incoming data, updates its internal state, decides which hypothesis to evaluate next, and queries the necessary data. This continues until one hypothesis is sufficiently supported or all hypotheses are exhausted.

Even a relatively simple reasoning loop can outperform static ML models because it adapts its investigation based on intermediate findings rather than treating all cases identically.

Agentic AI application for 5G RAN

Building an Agentic AI application for 5G RAN follows a structured 12-step procedure, evolving from prototype to production RIC xApp deployment. Each step addresses specific challenges in telecom environments (low latency, multi-vendor integration, carrier-grade reliability) with precise infrastructure sizing and validation criteria.

Phase 1: Foundation

Step 1: Requirements Definition

Objective: Define agent scope (e.g., "HO failure RCA <30s") and success metrics.

1. Document use case: Input=gNB logs → Output=Root cause + remediation

2. Define KPIs: Accuracy>90%, Latency<5s, MTTR reduction>50%

3. Guardrails: No config changes without human approval (confidence<85%)

4. Data sources: QXDM logs, E2SM-KPM, PM counters, MDT traces

Infra Needs: Local dev machine (16GB RAM, Python 3.11)

Step 2: Framework Selection

Compare & Choose:

LangGraph: Stateful graphs, telecom-grade persistence
CrewAI: Multi-agent orchestration, simpler API
AutoGen: Microsoft-backed, E2 integration focus
LlamaIndex: RAG-focused (logs → knowledge base)

Code:

pip install langgraph langchain-openai chromadb fastapi uvicorn

Phase 2: Data Pipeline

Step 3: Log Ingestion & Parsing

Challenge: QXDM unstructured logs → structured events

python

def parse_ran_logs(file_path: str) -> dict:

"""Extracts RRC/MAC events from QXDM CSV"""

df = pd.read_csv(file_path)

events = {

'ho_failures': len(df[df['msg_type']=='RRCReconfigFail']),

'rsrp_drops': len(df[df['L1_RSRP']<-100]),

'pci_list': df['PCI'].unique().tolist()

}

return events

Infra: ELK(Elasticsearch, Logstash, and Kibana) stack (Elasticsearch 8GB heap) for log retention

Step 4: Vector Database Setup

Purpose: Store historical RCA episodes for few-shot learning

docker run -d -p 8000:8000 --name chroma chromadb/chroma

Schema: {timestamp, cell_id, symptoms, diagnosis, remediation, outcome}

Step 5: Tool Development

Critical Tools for RAN:

1. E2_KPM_Query: RIC PM counters (PRB, HO rates)

2. 3GPP_KB: TS 38.331 parameters lookup

3. O1_Config: Antenna tilt/beam TCI changes

4. Xn_PCI: Neighbor optimization

Phase 3: Agent Architecture

Step 6: Single Agent Prototype

python

from langgraph.graph import StateGraph

class RANState(TypedDict):

symptoms: str

diagnosis: str

confidence: float

actions: List[str]

# Supervisor agent routes to specialists

def supervisor_agent(state: RANState):

return llm.invoke(f"Route to: LogAgent/KPIAgent/RemediationAgent")

Step 7: Multi-Agent Hierarchy

Key Design Principles:

Single Responsibility: Each agent focuses on one competency (parsing, KPIs, RCA, etc.)
State Persistence: Shared RANState object passes data between agents
Confidence Gating: Routes to human if diagnosis confidence <85%
Reflection Loop: Post-action critique enables learning

Infra: GPU pod (A10G 24GB VRAM) for parallel inference

Step 8: Memory Integration

Short-term: Conversation buffer (past 10 interactions)
Long-term: ChromaDB semantic search
Episodic: Past RCA episodes (vectorized)

Prompt Template:

"Similar past incident: Cell123 HO failure → PCI collision → fixed by XnAP remap"

Phase 4: RIC Integration

Step 9: E2/xApp Interface

O-RAN RIC Deployment:

# Helm values for Near-RT RIC

ric_platform:

e2term: enabled

xapp: ran-agentic-ai

E2:

functions: ["KPMv2", "RC"]

period: 1s

E2SM-KPM Report Handler:

python

@ric_xapp_handler

def kpm_report(msg: E2SmKpmReport):

ho_rate = msg.metrics['ho_success_rate']

if ho_rate < 80:

trigger_agent_analysis(cell_id=msg.cell_id)

Step 10: A1 Policy Enforcement

Policy Example:

{

"name": "ho_failure_policy",

"logic": "if ho_failure_rate > 20% then activate_agent",

"targets": ["gNB1", "gNB2"]

}

Phase 5: Productionization

Step 11: Containerization & Orchestration

Dockerfile:

FROM nvidia/cuda:12.1-runtime-ubuntu22.04

COPY . /app

RUN pip install -r requirements.txt

EXPOSE 8080

CMD ["uvicorn", "agent:app", "--host", "0.0.0.0"]

K8s Deployment:

apiVersion: apps/v1

kind: Deployment

spec:

replicas: 3

template:

spec:

containers:

- name: ran-agent

resources:

limits:

nvidia.com/gpu: 1 # A10G/T4

memory: "16Gi"

requests:

cpu: "4"

memory: "12Gi"

---

HorizontalPodAutoscaler:

scaleTargetRef: deployment/ran-agent

targetCPUUtilizationPercentage: 70

Step 12: Monitoring & Guardrails

This observability stack transforms Agentic AI from experimental to production reality, providing operators with real-time proof of value (MTTR dashboards), debuggability (traces), and safety (human governance) required for scaling across 10,000+ cell deployments.

Infrastructure Requirements

Component	Dev	Test	Production
Compute	16C/64GB	4xA10G (96GB)	8xA100/H100 (320GB total)
Storage	500GB SSD	2TB NVMe (ChromaDB)	10TB Ceph (logs + embeddings)
Network	1Gbps	10Gbps RIC fabric	25Gbps E2/O1 interfaces
RIC Platform	gNB vendor SW	RIC platform	O-RAN SC Near-RT RIC

Production Scale (1000 cells):

- Pods: 12 (4/node x 3 nodes)

- GPU Memory: 288GB total (Llama3.1 70B quantized)

- Throughput: 50 RCA/minute

- Latency: p99 <3s

Validation MOP

Lab Validation

1. Generate synthetic anomalies:

iperf3 → traffic → force HO failures

2. Agent execution:

curl -X POST ran-agent/analyze -d '{"logs":"ho_fail.qxdm"}'

3. Verify:

- Diagnosis accuracy: 92% vs manual

- Remediation: PCI change → HO success +25%

Field Trial Checklist

E2 subscription established (KPMv2 1s period)
A1 policy activated (thresholds validated)
Multi-vendor test: Vendor1+Vendor2 gNBs
Failover: Agent pod restart <30s
Rollback: Human override mechanism

6. Real-Time Deployment in 5G RAN Infrastructure

Deploying agentic AI in a live 5G network requires careful consideration of latency, safety, and integration points. The most natural deployment location is the Near-Real-Time RIC in an O-RAN architecture. The Near-RT RIC already aggregates near-real-time telemetry via the E2 interface and hosts xApps designed for control and optimization.
An agentic RCA xApp running in the Near-RT RIC can analyze data at sub-second to second-level timescales without interfering with critical control-plane functions. In this setup, the agent operates primarily in an advisory mode, producing RCA reports and recommendations rather than directly enforcing changes.
For longer-term analysis and learning, the same agent logic can be deployed in the non-RT RIC, where it has access to richer historical data and policy frameworks. In some cases, lightweight agent components may even be embedded within the gNB for ultra-fast detection, though this is typically limited to narrow use cases due to resource constraints.

7. Operational Challenges and Mitigations

Agentic AI introduces new challenges alongside its benefits. One concern is hallucination or overconfident reasoning based on incomplete data. In telecom systems, this risk is mitigated by grounding the agent’s reasoning in hard metrics and domain constraints. Another challenge is scalability. A national network may contain tens of thousands of cells, requiring careful design of agent orchestration and load management.
Explainability remains both a challenge and a strength. While agents can produce detailed reasoning chains, those explanations must be carefully structured to be useful to human operators. Free-form narratives are less effective than explanations that clearly link symptoms to evidence and conclusions.

8. Conclusion

Agentic AI transforms 5G RAN from reactive monitoring to proactive intelligence, with O-RAN providing the perfect disaggregated canvas for scalable deployment. Future Rel-19 native AI hooks will embed agents directly in gNB firmware.

As networks evolve toward 6G and beyond, the trend toward autonomy will accelerate. Agentic AI will not replace human engineers, but it will increasingly act as a first-line analyst, handling routine investigations and surfacing well-reasoned conclusions for review.

9. References

1. Edge Agentic AI Framework for Autonomous Network Optimisation in O-RAN

https://arxiv.org/html/2507.21696v4

2. Agentic AI for Mobile Network RAN Management and Optimization, https://arxiv.org/html/2511.02532v1

3. https://www.ericsson.com/en/blog/2025/7/agentic-ai-pathway-to-autonomous-network-level-5

4. https://blog.langchain.com/customers-vodafone/

5. https://www.ibm.com/think/topics/agentic-architecture

6. https://www.linkedin.com/posts/philip-laidler_helpful-vision-with-illustrative-use-cases-activity-7339957672428949504-5sp0/

7. https://www.geeksforgeeks.org/artificial-intelligence/agentic-ai-architecture/