AI-Powered English Pronunciation Assistant

Chrome Extension : Phonaify

Phonaify helps non-native English speakers improve pronunciation and comprehension while reading online.

Built with Gemini Nano and the local Prompt API, it allows users to highlight any word, speak directly in the browser, and receive instant, real-time pronunciation feedback.

AI-Powered English Pronunciation Assistant

AI-Powered English Pronunciation Assistant

Chrome Extension : Phonaify

Chrome Extension : Phonaify

Phonaify helps non-native English speakers improve pronunciation and comprehension while reading online.

Built with Gemini Nano and the local Prompt API, it allows users to highlight any word, speak directly in the browser, and receive instant, real-time pronunciation feedback.

2 weeks, October 2025

2 weeks- October 2025

Timeframe

Timeframe




Figma




Tools

Tools



Lead UX designer & system design



Role

Role

1 product designer , 1 engineer

The Team

The Team

August - September 2024

Timeframe

Figma

Tools

Product Design Lead

Role

3 product designers, 1 pm ,1 engineer

The Team

01. Context

01. Context
01. Context
What’s the challenge?
What’s the challenge?
What’s the challenge?

Build a local web app or Chrome Extension powered by Gemini Nano, using built-in AI APIs such as Prompt, Proofreader, Summarizer, Translator, Writer, or Rewriter.

02. Problem Framing

02. Problem Framing
02. Problem Framing
Why a Chrome Extension?
Why a Chrome Extension?
Why a Chrome Extension?

Learning without disruption

Hover-based interaction lets users access guidance without leaving the page or breaking focus.

Learn in context

Users can practice directly while reading, turning everyday browsing into an interactive experience.

Accessible across formats

Works on webpages, PDFs, and online publications.

Why a language pronunciation learning tool?
Why a language pronunciation learning tool?
Why a language pronunciation learning tool?

Most existing Chrome extensions focus on grammar and translation, while pronunciation support is limited.

Yet for language learners, speaking accurately and confidently is often the hardest skill to master.

Research highlighted three core gaps…

Research highlighted three core gaps…

Speaking Anxiety

Over 60% of learners struggle with speaking due to pronunciation anxiety and limited feedback.

Practice Needs Feedback

Improvement requires real-time feedback to guide accurate practice.

Practice Anytime, Anywhere


People want to learn without tutors or lessons.


Why is AI essential in this case?
Why is AI essential in this case?
Why is AI essential in this case?

Traditional methods can’t provide instant, personalized feedback at scale.

Traditional methods can’t provide instant, personalized feedback at scale.

Speech Analysis

Analyzes voice input to detect accuracy, phonetic errors.

Progress in pronunciation depends on feedback. This tool provides real-time, AI-driven guidance so learners can refine their speech independently and continuously.


Adaptive Feedback

Generates personalized, real-time feedback tailored to users.

Progress in pronunciation depends on feedback. This tool provides real-time, AI-driven guidance so learners can refine their speech independently and continuously.


Contextual Learning

Real-time text recognition to connect learning while reading.

Progress in pronunciation depends on feedback. This tool provides real-time, AI-driven guidance so learners can refine their speech independently and continuously.


This gap inspired me to design an AI-powered pronunciation tool that makes language practice intuitive, contextual, and accessible within everyday browsing.

03. Solution

03. Solution
03. Solution
🔊 Volume up to see it in action! You can also try it yourself — it’s live on the Extension Store!

🔊 Volume up to see it in action! You can also try it yourself — it’s live on the Extension Store!

04.Accomplishments

04.Accomplishments
Model Stability

Model Stability

The Gemini Nano model required parameter adjustments to improve consistency and ensure reliable performance during experimentation.

Simplified Phonetic Comparison Algorithm

Simplified Phonetic Comparison Algorithm

Selected the Longest Common Sequence algorithm to compare phonetic differences efficiently. Its simplicity made it easy to implement, debug, and interpret, ensuring clear, reliable feedback for users.

Advanced Prompting Techniques

Advanced Prompting Techniques

Implemented strategies like chain-of-thought prompting to enhance reasoning and output precision in phonetic analysis.

05.Challenges

05.Challenges

Working under tight timelines, our team leveraged AI-assisted coding tools such as Gemini CLI to accelerate development, building primarily with React, HTML/CSS/JavaScript, Vite, CRXJS, and Chrome Browser APIs.

During model training, we encountered three key challenges:

During model training, we encountered three key challenges:

Defining the Accuracy Threshold

Defining the Accuracy Threshold

A 100% match felt too rigid, yet lowering the bar risked providing misleading positive feedback.

Solution: The team decided to let the Gemini Prompt API dynamically decide the threshold based on context and phonetic similarity.

Handling Phonetic Differences and Edge Cases

Handling Phonetic Differences and Edge Cases

Matching phonetic sequences proved difficult when users’ input differed drastically from the expected pronunciation (e.g., “halo” vs. “slkjfdksjfdslhfsdklf lo”).

Solution: The team explored and adopted a character-level matching approach to ensure speed, reliability, and easier integration with the local AI system.

Local API Delay

Local API Delay

Running locally ensures privacy and offline access but introduces a 5–8 second delay while the model initializes.

UI - HTML / CSS
UI - HTML / CSS

UI - HTML / CSS

Prompt API with structured output - JavaScript
Prompt API with structured output - JavaScript

Prompt API with structured output

06. Process - System Flow

06. Process - System Flow
06. Process - System Flow
Word Card (Popup)

These three diagrams visualize the full system architecture of Phonaify

Word Card

Defines the interface layer and user interaction surface.

System Logic Flow

Maps the user journey and interaction logic.

AI Workflow

Outlines data processing between Chrome’s pre-processing and Gemini Nano’s multimodal feedback generation.

Phonaify System Logic Flow
Phonaify System Logic Flow
AI Workflow

07. Process - Visual Identity

07. Process - Visual Identity

08. Next Step

08. Next Step
User Testing

Conduct testing to ensure usability, and identify any accessibility barriers with diverse user groups

Multillingual Translation

Deepen AI integration to enable multilingual translation and support language learning beyond English.

Expand the Learning Ecosystem

Add features that let users save words and track pronunciation history, creating a continuous learning loop.

Develop System Settings

Let users choose pronunciation styles (e.g., American, British, Australian) and voice types (different genders).


User Testing

Conduct testing to ensure usability, and identify any accessibility barriers with diverse user groups

Multillingual Translation

Deepen AI integration to enable multilingual translation and support language learning beyond English.

Expand the Learning Ecosystem

Add features that let users save words and track pronunciation history, creating a continuous learning loop.

Develop System Settings

Let users choose pronunciation styles (e.g., American, British, Australian) and voice types (different genders).