Work | Formentia

AI-Powered Fashion Assistant

MyCloset

A full-stack AI fashion platform that transforms how users manage their wardrobe. MyCloset combines multimodal LLM classification, intelligent outfit recommendation, and real-time computer vision processing in a production PWA.

The Challenge

Fashion is deeply personal, contextual, and visual. Building a system that reliably understands clothing taxonomy, generates stylistically coherent outfit combinations, and processes images at consumer-grade quality required bridging multiple AI disciplines in a single product.

Our Approach

1Multimodal classification pipeline with a fine-tuned Florence-2 vision-language model as the primary classifier and Gemini escalation for hard categories -- extracting type, color, pattern, material, fit, season, occasion, and brand with confidence scoring.
2LLM-driven outfit recommendation that reasons over a user's full wardrobe, considering occasion, weather, personal style profile, and wear history to generate coherent combinations.
3Custom background removal pipeline using tight, simple segmentation models running on Cloud Run. Gemini image models were tried for this and turned out to be unreliable -- they tend to modify the image rather than cleanly remove the background, sometimes even painting in a checkered 'transparent' pattern. Purpose-built segmentation models won out.
4Closet-level style analytics that synthesize wardrobe composition into actionable insights -- dominant colors, style descriptors, gap analysis, and shopping recommendations.

AI & Technical Highlights

Multimodal LLM Classification

Image-to-taxonomy pipeline extracting 10+ attribute dimensions from clothing photos, powered by a fine-tuned Florence-2 vision-language model with Gemini escalation for difficult categories.

Generative Recommendation

LLM-based outfit suggestion engine that reasons compositionally over items, context, and user preferences.

Computer Vision Pipeline

Purpose-built segmentation models for clothing photography, handling the edge cases general models mishandle: garments on hangers, awkward angles, cluttered backgrounds, and flat-lays.

Style Intelligence

Closet-level analysis synthesizing wardrobe patterns into style profiles, gap identification, and personalized recommendations.

Technologies

Next.js 15TypeScriptFirebaseGenkitGoogle GeminiCloud RunTailwind CSSPWA

Technical deep dive

MyCloset runs a multi-model inference pipeline with confidence-based routing, custom-trained vision models for clothing-specific tasks, and a personalized RAG layer that grounds outfit generation in the user's actual wardrobe, wear history, and external context like weather.

Pipeline

1
Segmentation
Custom-trained segmentation model specifically for clothing photography. Handles the cases general segmentation models miss: garments on hangers, awkward angles, bad lighting, cluttered backgrounds, and items laid flat. Trained on a curated dataset of real-world user uploads rather than clean product photography.
We specifically chose a purpose-built segmentation approach over using Gemini or other generative image models for this step. Generative models turned out to be unreliable background removers -- they tend to alter the image rather than cleanly isolate the garment, sometimes even hallucinating a checkered 'transparent' pattern. For segmentation, simple and purpose-trained beat large and general.
2
Multimodal Classification
Two-pass classification against a hierarchical taxonomy (type, subtype, body location, fabric, pattern, color, season, occasion, brand). Primary pass uses a LoRA-fine-tuned Florence-2 vision-language model running on Cloud Run with GPU. Carefully tuned pre-prompting and structured JSON output enforcement with confidence scoring per attribute dimension.
Worth noting: once the background is cleanly removed and the garment is isolated on a white or transparent background, Gemini performs reliably on object-level identification -- which is why the fallback tier works. The hard part was the segmentation step before it, not the classification after.
3
Confidence-Based Escalation
When primary-model confidence falls below threshold -- or when the item category is one the local model systematically underperforms on -- the pipeline escalates to Gemini Flash. Most classifications are handled by the cheap local inference; specific categories and edge cases route to frontier inference. Trades cost and latency against quality per-inference rather than applying one policy to everything.
4
Image Cleanup (optional)
Gemini image generation and editing models for AI-driven photo cleanup: improving garment hang, removing wrinkles, correcting skew, and generally making user-uploaded photos look catalog-quality. Prompt-engineered edits rather than generic upscaling or filters.
5
Personalized RAG + Outfit Generation
Pre-query construction pulls relevant items from the user's closet using semantic similarity, structured taxonomy matching, and temporal signals (wear frequency, wear recency). Context assembly combines retrieved items, existing outfit history, weather forecast, and seasonal style rules. Prompt passes to Gemini with strict structured JSON response enforcement using our item UUIDs.
6
A/B Presentation + User Feedback
Users see two AI-generated outfit options with LLM-written justifications. Accepts, rejects, and query refinements feed back into the pre-query builder, progressively improving recommendation quality per-user. The same infrastructure powers future shopping recommendations against partner catalogs.

Model Infrastructure

Florence-2 (LoRA fine-tuned)

Primary multimodal classifier

PEFT fine-tuning on a curated clothing dataset combining public fashion imagery with synthetic examples generated via Gemini image models to cover taxonomy gaps. Where coverage was thin (e.g., plaid pajamas, specific pattern-fabric combinations), we generated targeted synthetic data to ensure taxonomy completeness rather than relying on whatever the public datasets happened to include. Runs on Cloud Run with GPU for cost-efficient inference.

Custom segmentation model

Background removal and item isolation

Trained specifically for clothing photography edge cases that general segmentation models mishandle. We evaluated generative image models (Gemini) for this step and found them unreliable: they tend to alter the image rather than cleanly remove the background. Simple, purpose-trained segmentation beat large, general-purpose for this specific task.

Gemini Flash

Escalation classifier for difficult item categories

Invoked on low-confidence primary inference and on item categories where the local model systematically underperforms. Shoes and handbags, for example, often read as featureless shapes to smaller vision models when users photograph them from above or against cluttered backgrounds. Gemini Flash handles these cases well out-of-the-box, so we route to it category-aware rather than attempting to force the local model to cover every case.

Gemini (general)

Outfit generation and style reasoning

Constrained to structured JSON responses referencing our taxonomy UUIDs.

Gemini image generation and editing models

AI-driven image cleanup

Prompt-engineered photo improvement for skew correction, wrinkle removal, and garment-hang fixes.

Routing

Routing shifts cost and latency between local and frontier inference based on confidence thresholds and item category. We tried the obvious alternative -- more aggressive fine-tuning of the local VLM -- and found diminishing returns: past a point, more training data on the hard categories stopped improving accuracy. Category-aware fallback to Gemini turned out to be more cost-effective than continuing to push the local model, and it keeps our inference stack honest about which cases actually need frontier-class reasoning.

Data Architecture

User wardrobe, outfit history, and wear-event logs are indexed with both semantic embeddings and structured metadata. RAG queries combine vector similarity with structured filtering (taxonomy matching, temporal windows) and external context injection (weather forecast, calendar context, location). Context assembly is tuned per-query-type rather than one-size-fits-all.

Feedback Loop

Every outfit A/B shown to the user produces accept, reject, and refinement signals. These feed into the pre-query builder and prompt construction layer, tuning retrieval weights and context assembly for that user over time. We deliberately don't fine-tune the underlying LLM on these signals -- we tried that approach earlier and found it less reliable than keeping the base model stable and improving the layers around it. Prompt-layer improvement gives us faster iteration, interpretable behavior, and no risk of model drift.

Smart Camera Mirror

Mirror

Visit Mirror↗

A privacy-first smart mirror PWA that transforms any device into an intelligent mirror with real-time sensor processing. Designed for utility without compromise on privacy -- zero accounts, zero data collection, fully on-device.

The Challenge

Building a truly useful mirror app requires real-time camera processing with smooth performance across a wide range of devices, sensor fusion for features like stabilization and adaptive brightness, and a UX polished enough to replace a physical mirror -- all without any server-side processing.

Our Approach

1Real-time gyroscopic stabilization using Device Motion APIs with smoothed rotation rate processing to counteract hand tremor.
2Adaptive brightness system combining ambient light sensors (where available) with Capacitor native bridge fallbacks and time-based heuristics.
3Performance-optimized camera pipeline using WebRTC with efficient CSS transforms for zoom, pan, and brightness adjustments without re-rendering.
4Cross-platform native shell via Capacitor for iOS and Android distribution while maintaining the full PWA experience in browsers.

AI & Technical Highlights

Sensor Fusion

Real-time processing of gyroscope and ambient light data for stabilization and adaptive brightness.

On-Device Processing

All computation happens client-side -- zero server calls, zero data exfiltration, full offline support.

Cross-Platform PWA

Single codebase targeting web, iOS App Store, and Google Play via progressive enhancement.

Technologies

WebRTCDevice Motion APIAmbient Light SensorPWACapacitorFirebase Hosting

Technical deep dive

Mirror is a fully on-device PWA with no backend, accounts, or telemetry -- all sensor processing, camera handling, and state management happens client-side. It also serves as our PWA testbed: a low-risk surface for validating new Next.js patterns, Capacitor native bridges, service-worker update strategies, and cross-platform distribution workflows before rolling them into larger products like MyCloset.

Pipeline

1
Camera Pipeline
WebRTC getUserMedia with efficient CSS transforms for zoom, pan, and brightness. No per-frame canvas re-rendering; transforms are GPU-accelerated via CSS, keeping performance smooth across mid-range devices.
2
Gyroscopic Stabilization
Device Motion API rotation-rate data is smoothed with a low-pass filter and mapped to inverse CSS transforms on the video element, counteracting hand tremor in real time without processing the pixel buffer.
3
Adaptive Brightness
Three-tier fallback: Ambient Light Sensor API where available, Capacitor-bridged native ambient sensing on mobile, and time-of-day heuristics as ultimate fallback.
4
Cross-Platform Distribution
Single codebase targets web (PWA), iOS App Store, and Google Play via Capacitor wrapper. Progressive enhancement means web gets everything that works without native permissions; native shell adds sensor access where OS gates require it.

Built with depth, shipped to production

MyCloset

The Challenge

Our Approach

AI & Technical Highlights

Multimodal LLM Classification

Generative Recommendation

Computer Vision Pipeline

Style Intelligence

Technologies

Pipeline

Segmentation

Multimodal Classification

Confidence-Based Escalation

Image Cleanup (optional)

Personalized RAG + Outfit Generation

A/B Presentation + User Feedback

Model Infrastructure

Florence-2 (LoRA fine-tuned)

Custom segmentation model

Gemini Flash

Gemini (general)

Gemini image generation and editing models

Data Architecture

Feedback Loop

Mirror

The Challenge

Our Approach

AI & Technical Highlights

Sensor Fusion

On-Device Processing

Cross-Platform PWA

Technologies

Pipeline

Camera Pipeline

Gyroscopic Stabilization

Adaptive Brightness

Cross-Platform Distribution

Let's build your next project