Kandice's Portfolio

AI-Powered English Pronunciation Assistant

Chrome Extension : Phonaify

Phonaify helps non-native English speakers improve pronunciation and comprehension while reading online.

Built with Gemini Nano and the local Prompt API, it allows users to highlight any word, speak directly in the browser, and receive instant, real-time pronunciation feedback.

AI-Powered English Pronunciation Assistant

Chrome Extension : Phonaify

2 weeks, October 2025

2 weeks- October 2025

Timeframe

Best Multimodal AI Award – out of 14,000+ participants

Impact

Lead UX designer & system design

Role

1 product designer , 1 engineer

The Team

August - September 2024

Timeframe

Figma

Tools

Product Design Lead

Role

3 product designers, 1 pm ,1 engineer

The Team

01. Context

What’s the challenge?

Build a local web app or Chrome Extension powered by Gemini Nano, using built-in AI APIs such as Prompt, Proofreader, Summarizer, Translator, Writer, or Rewriter.

02. Problem Framing

Why a Chrome Extension?

Learning without disruption

Hover-based interaction lets users access guidance without leaving the page or breaking focus.

Learn in context

Users can practice directly while reading, turning everyday browsing into an interactive experience.

Accessible across formats

Works on webpages, PDFs, and online publications.

Why a language pronunciation learning tool?

Most existing Chrome extensions focus on grammar and translation, while pronunciation support is limited.

Yet for language learners, speaking accurately and confidently is often the hardest skill to master.

Research highlighted three core gaps…

Speaking Anxiety

Over 60% of learners struggle with speaking due to pronunciation anxiety and limited feedback.

Practice Needs Feedback

Improvement requires real-time feedback to guide accurate practice.

Practice Anytime, Anywhere

People want to learn without tutors or lessons.

Why is AI essential in this case?

Traditional methods can’t provide instant, personalized feedback at scale.

Speech Analysis

Analyzes voice input to detect accuracy, phonetic errors.

Progress in pronunciation depends on feedback. This tool provides real-time, AI-driven guidance so learners can refine their speech independently and continuously.

Adaptive Feedback

Generates personalized, real-time feedback tailored to users.

Progress in pronunciation depends on feedback. This tool provides real-time, AI-driven guidance so learners can refine their speech independently and continuously.

Contextual Learning

Real-time text recognition to connect learning while reading.

Progress in pronunciation depends on feedback. This tool provides real-time, AI-driven guidance so learners can refine their speech independently and continuously.

This gap inspired me to design an AI-powered pronunciation tool that makes language practice intuitive, contextual, and accessible within everyday browsing.

03. Solution

🔊 Volume up to see it in action! You can also try it yourself — it’s live on the Extension Store!

🔊 Volume up to see it in action! You can also try it yourself — it’s live on the Extension Store!

04.Accomplishments

Model Stability

Model Stability

The Gemini Nano model required parameter adjustments to improve consistency and ensure reliable performance during experimentation.

Simplified Phonetic Comparison Algorithm

Simplified Phonetic Comparison Algorithm

Selected the Longest Common Sequence algorithm to compare phonetic differences efficiently. Its simplicity made it easy to implement, debug, and interpret, ensuring clear, reliable feedback for users.

Advanced Prompting Techniques

Advanced Prompting Techniques

Implemented strategies like chain-of-thought prompting to enhance reasoning and output precision in phonetic analysis.

05.Challenges

Working under tight timelines, our team leveraged AI-assisted coding tools such as Gemini CLI to accelerate development, building primarily with React, HTML/CSS/JavaScript, Vite, CRXJS, and Chrome Browser APIs.

During model training, we encountered three key challenges:

Defining the Accuracy Threshold

Defining the Accuracy Threshold

A 100% match felt too rigid, yet lowering the bar risked providing misleading positive feedback.

Solution: The team decided to let the Gemini Prompt API dynamically decide the threshold based on context and phonetic similarity.

Handling Phonetic Differences and Edge Cases

Handling Phonetic Differences and Edge Cases

Matching phonetic sequences proved difficult when users’ input differed drastically from the expected pronunciation (e.g., “halo” vs. “slkjfdksjfdslhfsdklf lo”).

Solution: The team explored and adopted a character-level matching approach to ensure speed, reliability, and easier integration with the local AI system.

Local API Delay

Local API Delay

Running locally ensures privacy and offline access but introduces a 5–8 second delay while the model initializes.

UI - HTML / CSS

Prompt API with structured output - JavaScript

Prompt API with structured output

06. Process - System Flow

Word Card (Popup)

These three diagrams visualize the full system architecture of Phonaify

Word Card

Defines the interface layer and user interaction surface.

System Logic Flow

Maps the user journey and interaction logic.

AI Workflow

Outlines data processing between Chrome’s pre-processing and Gemini Nano’s multimodal feedback generation.

Phonaify System Logic Flow

AI Workflow

07. Process - Visual Identity

08. Impact

Award

Award

Best Multimodal AI Application

Recognition

Recognition

Top project out of 14,000+ participants

Link

Link

https://devpost.com/software/phonaify

09. Next Step

User Testing

Conduct testing to ensure usability, and identify any accessibility barriers with diverse user groups

Multillingual Translation

Deepen AI integration to enable multilingual translation and support language learning beyond English.

Expand the Learning Ecosystem

Add features that let users save words and track pronunciation history, creating a continuous learning loop.

Develop System Settings

Let users choose pronunciation styles (e.g., American, British, Australian) and voice types (different genders).

User Testing

Conduct testing to ensure usability, and identify any accessibility barriers with diverse user groups

Multillingual Translation

Deepen AI integration to enable multilingual translation and support language learning beyond English.

Expand the Learning Ecosystem

Add features that let users save words and track pronunciation history, creating a continuous learning loop.

Develop System Settings

Let users choose pronunciation styles (e.g., American, British, Australian) and voice types (different genders).