Build a local web app or Chrome Extension powered by Gemini Nano, using built-in AI APIs such as Prompt, Proofreader, Summarizer, Translator, Writer, or Rewriter.
Learning without disruption
Hover-based interaction lets users access guidance without leaving the page or breaking focus.
Learn in context
Users can practice directly while reading, turning everyday browsing into an interactive experience.
Accessible across formats
Works on webpages, PDFs, and online publications.
Most existing Chrome extensions focus on grammar and translation, while pronunciation support is limited.
Yet for language learners, speaking accurately and confidently is often the hardest skill to master.

Speaking Anxiety
Over 60% of learners struggle with speaking due to pronunciation anxiety and limited feedback.
Practice Needs Feedback
Improvement requires real-time feedback to guide accurate practice.
Practice Anytime, Anywhere
People want to learn without tutors or lessons.
Speech Analysis
Adaptive Feedback
Contextual Learning
This gap inspired me to design an AI-powered pronunciation tool that makes language practice intuitive, contextual, and accessible within everyday browsing.
The Gemini Nano model required parameter adjustments to improve consistency and ensure reliable performance during experimentation.
Selected the Longest Common Sequence algorithm to compare phonetic differences efficiently. Its simplicity made it easy to implement, debug, and interpret, ensuring clear, reliable feedback for users.
Implemented strategies like chain-of-thought prompting to enhance reasoning and output precision in phonetic analysis.
Working under tight timelines, our team leveraged AI-assisted coding tools such as Gemini CLI to accelerate development, building primarily with React, HTML/CSS/JavaScript, Vite, CRXJS, and Chrome Browser APIs.
A 100% match felt too rigid, yet lowering the bar risked providing misleading positive feedback.
Solution: The team decided to let the Gemini Prompt API dynamically decide the threshold based on context and phonetic similarity.
Matching phonetic sequences proved difficult when users’ input differed drastically from the expected pronunciation (e.g., “halo” vs. “slkjfdksjfdslhfsdklf lo”).
Solution: The team explored and adopted a character-level matching approach to ensure speed, reliability, and easier integration with the local AI system.
Running locally ensures privacy and offline access but introduces a 5–8 second delay while the model initializes.

Word Card (Popup)
These three diagrams visualize the full system architecture of Phonaify
Word Card
Defines the interface layer and user interaction surface.
System Logic Flow
Maps the user journey and interaction logic.
AI Workflow
Outlines data processing between Chrome’s pre-processing and Gemini Nano’s multimodal feedback generation.


















