Multimodal Inputs

Knobase agents support multimodal input, allowing users to interact using more than just text. This means students, teachers, and staff can submit images, PDFs, and visual aids—and the agent will respond with context-aware feedback tailored to the content.

Written By Christopher Lee

Last updated 11 months ago

Why Multimodal Matters

✅ Supports visual learners and creative tasks
✅ Enables real-world classroom scenarios like worksheet review or diagram analysis
✅ Makes agents more versatile and inclusive
✅ Reduces barriers for younger students or those with language challenges

Supported Input Types

Input Type	Examples	Agent Capabilities
📝 Text Queries	“What’s the homework for Friday?”	Retrieves info from documents, responds in appropriate tone
🖼️ Uploaded Images	Worksheets, whiteboard drawings, student sketches	Analyzes visual content, identifies questions, gives feedback
📄 PDFs & Visual Aids	Lesson plans, rubrics, scanned notes	Extracts relevant sections, summarizes, or answers based on content

💡 Example Interaction

Student: Uploads a worksheet image and asks,
“Can you help me with question 3?”
Agent: Analyzes the image, identifies question 3, and replies:
“Sure! Question 3 asks you to compare two ecosystems. Try listing their differences in climate, biodiversity, and human impact. You can refer to the ‘Science Term 2’ guide for examples.”