
This is the Apple Vision Pro Capstone Project for the CS495 class in the University of Alabama. We create a visionOS app which allows users to upload or capture data (image and audio) from the Apple Vision Pro and send it to AWS where Hugging Face models will be applied to detect objects in an image. We also implemented speech-to-text to allow users to talk to the headset. Then, the recognized text will be sent to OpenAI and result will be displayed back to users.