Summary
Builds a locally-hosted voice assistant for security operations using open-source components. The model reasons over defined tools and invokes them on demand, with no training or fine-tuning required. Skills to the moon.
| Component | Tool | Role |
|---|---|---|
| Speech-to-text | faster-whisper | Transcribe voice input locally |
| LLM | Ollama (Llama 3.1 / Phi-4) | Tool selection and response generation |
| Text-to-speech | Kokoro | Speak results back |
| Browser control | Playwright | Read authenticated dashboards |
| Ticket search | Jira REST API | Query tickets by keyword/person |
| Display control | PowerShell via subprocess | Toggle scheduled power tasks |
Out of scope for this post: multi-turn memory, authentication from scratch, any UI.
1. Architecture
The assistant runs as a background process and follows this loop on every interaction:
Microphone input
│
▼
Wake word detection ── keyword match on transcription (or Porcupine)
│
▼
Speech-to-text ── faster-whisper
│
▼
LLM with tool calling ── Ollama (Llama 3.1 or Phi-4)
│
├──► Tool: Display Control ── PowerShell via subprocess
├──► Tool: Jira Search ── Jira REST API
├──► Tool: Browser Read ── Playwright
└──► Tool: Web Fetch ── httpx + BeautifulSoup
│
▼
Text-to-speech ── Kokoro
│
▼
Speaker output
The LLM does not answer questions directly. It selects a tool and arguments, Python executes the tool and passes the result back, and the model formulates a spoken response.
Model Selection
Criteria: 8GB VRAM target, native tool/function calling, under 3s response time.
- Llama 3.1 8B Q4_K_M — best general-purpose option, strong tool calling
- Phi-4 Mini — faster, smaller, good for CPU-only deployments
ollama pull llama3.1
# or
ollama pull phi4-mini
2. Implementation
2.1 Dependencies
pip install ollama faster-whisper pyaudio kokoro playwright httpx \
beautifulsoup4 requests soundfile sounddevice pvporcupine
playwright install chromium
pyaudio requires PortAudio on Windows: pip install pipwin && pipwin install pyaudio,
or download the prebuilt wheel from Christoph Gohlke's repository.
2.2 Speech-to-Text
from faster_whisper import WhisperModel
model = WhisperModel("base", device="cuda", compute_type="float16")
# CPU: WhisperModel("base", device="cpu", compute_type="int8")
def transcribe(audio_path: str) -> str:
segments, _ = model.transcribe(audio_path, beam_size=5)
return " ".join(s.text.strip() for s in segments)
2.3 Audio Capture
Energy-based VAD — records until silence is detected:
import pyaudio, wave
import numpy as np
RATE = 16000
CHUNK = 1024
SILENCE_DB = 40
SILENCE_S = 1.5
def record_command(output_path="command.wav") -> str:
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paInt16, channels=1,
rate=RATE, input=True, frames_per_buffer=CHUNK)
frames, silent_chunks = [], 0
silence_limit = int(RATE / CHUNK * SILENCE_S)
while True:
data = stream.read(CHUNK)
frames.append(data)
rms = np.sqrt(np.mean(np.frombuffer(data, dtype=np.int16).astype(np.float32) ** 2))
db = 20 * np.log10(rms + 1e-6)
silent_chunks = silent_chunks + 1 if db < SILENCE_DB else 0
if silent_chunks >= silence_limit and len(frames) > silence_limit:
break
stream.stop_stream(); stream.close(); pa.terminate()
with wave.open(output_path, 'wb') as wf:
wf.setnchannels(1)
wf.setsampwidth(pa.get_sample_size(pyaudio.paInt16))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
return output_path
2.4 Wake Word Detection
WAKE_WORD = "jarvis"
def contains_wake_word(text: str) -> bool:
return text.lower().strip().startswith(WAKE_WORD)
def strip_wake_word(text: str) -> str:
words = text.lower().strip().split()
if words and words[0] == WAKE_WORD:
return " ".join(words[1:]).strip()
return text.strip()
Porcupine by Picovoice detects the wake word before Whisper is invoked — lower latency, lower CPU. Free tier supports one custom wake word.
2.5 Tool Definitions
Tools follow the OpenAI function-calling schema that Ollama uses:
TOOLS = [
{
"type": "function",
"function": {
"name": "control_displays",
"description": (
"Turn the SOC display monitors on or off. "
"Use when the user asks to start, wake, turn on, shut down, or turn off monitors."
),
"parameters": {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["on", "off"],
"description": "Whether to turn displays on or off."
}
},
"required": ["action"]
}
}
},
{
"type": "function",
"function": {
"name": "search_jira",
"description": (
"Search Jira tickets by keyword, topic, or person. "
"Use when the user asks about a ticket or what someone said about a topic."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query, e.g. 'blocking traffic on weekends Andrew'"
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "read_dashboard",
"description": (
"Read a metric from an authenticated security dashboard. "
"Use for asset counts, alert counts, or any metric on the SOC displays."
),
"parameters": {
"type": "object",
"properties": {
"dashboard": {
"type": "string",
"enum": ["sentinelone", "jira", "darktrace"],
"description": "Which dashboard to read from."
},
"metric": {
"type": "string",
"description": "What to look for, e.g. 'total enrolled assets'"
}
},
"required": ["dashboard", "metric"]
}
}
}
]
2.6 Tool Implementations
Display control
import subprocess
def control_displays(action: str) -> str:
if action == "off":
subprocess.run(["powershell", "-Command",
"Disable-ScheduledTask -TaskName 'SOC-Display-Shutdown'; shutdown /s /t 60"])
return "Displays shutting down in 60 seconds."
elif action == "on":
subprocess.run(["powershell", "-Command",
"Enable-ScheduledTask -TaskName 'SOC-Display-Launch'"])
return "Display launch task re-enabled."
return "Unknown action."
Jira search
import requests, os
JIRA_BASE = os.getenv("JIRA_BASE_URL")
JIRA_USER = os.getenv("JIRA_USER")
JIRA_TOKEN = os.getenv("JIRA_API_TOKEN")
def search_jira(query: str) -> str:
jql = f'text ~ "{query}" ORDER BY updated DESC'
resp = requests.get(
f"{JIRA_BASE}/rest/api/3/search",
params={"jql": jql, "maxResults": 3, "fields": "summary,description,comment"},
auth=(JIRA_USER, JIRA_TOKEN),
timeout=10
)
if resp.status_code != 200:
return f"Jira search failed: {resp.status_code}"
issues = resp.json().get("issues", [])
if not issues:
return "No matching tickets found."
results = []
for issue in issues:
comments = issue["fields"].get("comment", {}).get("comments", [])
last = comments[-1]["body"] if comments else "No comments."
results.append(f"{issue['key']}: {issue['fields']['summary']}\nLatest comment: {last}")
return "\n\n".join(results)
Browser read
from playwright.sync_api import sync_playwright
import os
PROFILE = os.path.expandvars(r'%LOCALAPPDATA%\Microsoft\Edge\User Data')
DASHBOARD_URLS = {
"sentinelone": "https://your-s1-console.com",
"jira": "https://yourorg.atlassian.net",
"darktrace": "https://your-darktrace.com",
}
def read_dashboard(dashboard: str, metric: str) -> str:
url = DASHBOARD_URLS.get(dashboard)
if not url:
return f"Unknown dashboard: {dashboard}"
with sync_playwright() as p:
ctx = p.chromium.launch_persistent_context(
user_data_dir=PROFILE, channel='msedge', headless=True)
page = ctx.new_page()
page.goto(url, wait_until="networkidle")
text = page.inner_text("body")
ctx.close()
return text[:3000]
headless=True is fine for reads and does not affect the display windows already open.
Flip to False and run in a separate process if the session requires a visible browser to
stay authenticated.
2.7 LLM Loop
import ollama
MODEL = "llama3.1"
TOOL_FUNCTIONS = {
"control_displays": control_displays,
"search_jira": search_jira,
"read_dashboard": read_dashboard,
}
SYSTEM_PROMPT = (
"You are a concise voice assistant for a security operations center. "
"Answer in one or two sentences. Summarize tool results clearly and briefly. "
"Do not repeat the user's question."
)
def run_agent(command: str) -> str:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": command},
]
response = ollama.chat(model=MODEL, messages=messages, tools=TOOLS)
msg = response["message"]
if not msg.get("tool_calls"):
return msg["content"]
for call in msg["tool_calls"]:
fn = TOOL_FUNCTIONS.get(call["function"]["name"])
result = fn(**call["function"]["arguments"]) if fn else "Tool not found."
messages.append(msg)
messages.append({"role": "tool", "content": result})
return ollama.chat(model=MODEL, messages=messages)["message"]["content"]
2.8 Text-to-Speech (Kokoro)
import sounddevice as sd
import soundfile as sf
import tempfile, os
from kokoro import KPipeline
pipeline = KPipeline(lang_code="a") # "a" = American English
def speak(text: str):
generator = pipeline(text, voice="af_heart", speed=1.0)
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
out_path = f.name
for _, _, audio in generator:
sf.write(out_path, audio, samplerate=24000)
break
data, sr = sf.read(out_path)
sd.play(data, sr); sd.wait()
os.unlink(out_path)
Kokoro uses CUDA when available, falls back to CPU. On 8GB VRAM, short responses generate in under a second. For CPU-only deployments, Piper is faster.
2.9 Main Loop
import time
def main():
while True:
try:
text = transcribe(record_command())
if not contains_wake_word(text):
continue
command = strip_wake_word(text)
if not command:
speak("Yes?")
command = transcribe(record_command())
speak(run_agent(command))
except KeyboardInterrupt:
break
except Exception as e:
print(f"Error: {e}")
time.sleep(1)
if __name__ == "__main__":
main()
2.10 Task Scheduler Registration
$action = New-ScheduledTaskAction `
-Execute "python.exe" `
-Argument "C:\SOC\assistant.py"
$trigger = New-ScheduledTaskTrigger -AtLogOn
Register-ScheduledTask `
-TaskName "SOC-AI-Assistant" `
-Action $action `
-Trigger $trigger `
-RunLevel Highest `
-Force
Add time.sleep(10) at the top of assistant.py so Ollama has time to load the model
before the first request.
3. Future Extensions
- Persistent context — carry last N turns for natural follow-up questions
- Porcupine wake word — replace always-transcribing loop with on-device keyword detection
- More tools — CVE lookups, firewall rule queries, alert triage
- Scheduled briefings — summarize overnight alerts and open tickets at shift start