Generative AI
Pose Tracking
"Solving the unpredictability of Generative AI in commercial photography."
This project introduces a pose-guided AI pipeline designed to reduce the high operational costs for small businesses. By integrating real-time pose tracking with stable diffusion, I replaced the 'black box' nature of AI with a controlled, professional-grade production workflow.
Year
Role
Duration

Problem Framing & Technical Pivot
During my internship, I observed that the financial and time costs of traditional photoshoots were prohibitive for small businesses. My role involved researching AI-driven solutions to streamline this process. Initially, I used LLMs to craft prompts for Text-to-Image generation, but found the results too inconsistent for professional use.
Finance Barriers
Small brands lack the budget and time for professional models, studios, and lengthy post-production
Unpredict Output
Text-to-image is a 'blind box'—AI can create, but it lacks the precision for specific poses.
Workflow Gaps
The missing link: Traditional methods are too slow, while basic AI is too random for commercial use
To achieve higher precision, I pivoted to ControlNet, leveraging specific reference images to provide structural guidance and ensure the AI-generated poses aligned perfectly with our requirements.


Opportunity
How might we create brand imagery by giving small businesses precise, intuitive control over AI-generated poses?
direction
To address this, I want developed a hardware-software integrated pipeline using real-time cameras and API-driven automation.
Process
ML5.JS
I came across ML5.js and its ability to track poses through a webcam using red skeletal lines. With this, I began integrating the ML5.js code into JavaScript project to bring the interactive elements to work.

Javascript
Because ML5.js tracks motion in real-time, I needed a way to freeze a single frame. I solved this by creating a trigger: when the button is pressed, JavaScript executes a function that extracts only the red skeletal lines from the code and saves the result as a PNG file.

Stable Diffusion
After obtaining the skeletal maps, I used ControlNet within Stable Diffusion to generate images based on those poses. The technical challenge was a significant learning curve for me, as I had no prior experience with server setup or API integration. I seek guidance from my instructor, which allowed me to successfully navigate these technical hurdles.
After Thoughts
Over these 14 weeks, I have successfully delivered a Proof of Concept that validates my vision. However, there is still significant room for evolution. My future roadmap includes seamlessly integrating diverse Models and LoRAs, streamlining the user experience (UX) for better intuitiveness, and incorporating LLMs to help users articulate and refine their character prompts more effectively.
This course has been incredibly inspiring. While AI replacement is a common concern today, I believe it's actually a call to action. By identifying and reimagining how specific fields operate through AI-driven systems, we don't just replace old methods—we pave the way for the next wave of innovation.
© 2026 All rights reserved