Image Intelligence - Seeing the World with Generative AI

Seeing What's Real with Generative AI

Imagine you're a work supervisor. Your goal is to ensure a job site is safe. That all employees are performing their tasks well according to professional standards. Ones that ensure their safety and that of those around them.

This is an important job. It requires you to be present at the site. To observe and evaluate.

How might Generative AI help you? Let's explore that by playing around with ChatGPT+ while using a few still images from a well-known video of a group on photo safari. Notice how prompting the AI will tease out the details we want and offer valuable insights.

Setting the Scene and Job to be Done

In our first image - the "before" shot - our job site looks totally normal. It portrays a group of travelers enjoying a photo safari in Africa. They're in a vehicle, and enjoying a spirited bit of adventure. One of the group is sitting in a very special spot. A front-mounted rig attached to the hood. He has an amazing view!

Photo safari with travelers in a vehicle

Generative AI is backed by a tech called Large Language Model (LLM). These types of AI are trained on a massive data corpus to read and write human text. Text prompts are our basic user interface into LLMs. Here's the prompt I'm using:

“You are a work supervisor. Your goal is to ensure a job site is safe and that all employees are performing according to professional standards that ensure their safety and that of those around. What's going on in this photo? Classify it. Assign % danger rating. Itemize any issues that you discover. If hazardous situation detected, suggest remediation steps.”

Using ChatGPT+, I'll send in this prompt along with the image. It cranks away and responds with a helpful analysis as a text report. This is what ChatGPT+ (4o model) comes back with as I tap enter with my text prompt and picture.

Classification: • Work Environment: Safari field guide or wildlife observer. • Job Task: Wildlife spotting or guiding. Danger Rating: Estimated Danger Level: 65% Issues Detected: 1. Exposure to Wildlife Risks: Being seated in an exposed position on the vehicle makes the individual vulnerable to wildlife encounters, especially from predators. 2. Lack of Visible Restraints: The person is seated in an open position without visible seatbelts or safety harnesses, increasing the risk of falling. 3. Unprotected Positioning: No apparent protective gear (helmet, sturdy harness) for off-road conditions and sudden vehicle movements. Suggested Remediation Steps: 1. Install Safety Harness: Provide a secured harness system to prevent falls, especially if the vehicle moves over uneven terrain. 2. Equip with Emergency Communication Device: Ensure the person has a two-way radio or other communication devices to alert the driver or other team members if needed. 3. Brief on Wildlife Safety Protocols: Regularly train and remind personnel on protocols for wildlife encounters, including emergency dismounting procedures. This assessment identifies moderate to high risks primarily due to the exposed position and lack of visible restraint. Implementing these safety improvements would help reduce the danger level.

It's like having a digital assistant on the scene that reports back to you so that you can take informed actions. You operate at a higher leader as a manager, a leader, a supervisor.

LLMs are MMLLMs

I know what you're thinking. GenAI, backed by LLMs, are trained on text. They can read and write. Fine, I get it, but how can they "see" what's in a picture?

Yeah, you're correct to think that.

By the end of 2023 the big AI R&D labs were successfully training their LLMs on vast collections of text as well as images. Generative AI are now multi-modal, and it's not really talked about. It's not like it's a secret, but people just sort of skip past it.

I suppose most people are still focusing on addressing opportunities around text.

Why Won't Humans Just Stay Safe for a While?

Watch what happens as the scene unfolds in our hypothetical workplace. Safety is job one!

Here's a snapshot from the video advancing along a few seconds. The "after" shot", so to speak. Life happens pretty quickly! Now things get extra exciting for the intrepid photographer when the famous native wildlife (that they traveled far and wide to see) makes an appearance!

Given the same prompt as before, let's see what the AI advises us about the photo. It clearly spots the special guest star when it makes its way on stage.

Classification: Wildlife Observation in Close Proximity to Dangerous Animal % Danger Rating: 85% (High Risk) Issues Detected 1. Close Proximity to a Lion: The person is sitting on an elevated vehicle seat but is extremely close to a lion, which poses a serious risk. 2. Attention and Focus: The person appears to be holding a camera, possibly taking photos, which may divert attention from potential danger and delay reaction in case of sudden movement from the lion. 3. No Escape Route: The individual’s seating position on the side of the vehicle leaves limited options for rapid retreat or safe repositioning if the lion becomes aggressive. Suggested Remediation Steps 1. Maintain Safe Distance: Ensure a minimum safe distance between humans and wild animals. For lions, staying inside a fully enclosed vehicle would be more appropriate. 2. Use Enclosures: Equip vehicles with protective barriers around seating areas to limit animal access. 3. Avoid Distracting Activities: Refrain from taking photos or using equipment that may divert attention when near dangerous animals. This situation is highly hazardous due to the presence of an unrestrained, potentially aggressive animal in close proximity. Following these steps could greatly reduce the risk of harm.

Experimenting shows the AI is highly capable of understanding the distinction between the "before" and "after" images. The most important aspect, a human potentially in danger, was called out, obviously. That's the really important capability to prove in this experiment.

Analysis, like the % danger rating, issues detected, and suggested remediation steps is showing extra ways AI can provide insights, advise, and help human-in-the-loop decision-support.

Consider how you want to read the AI response output. Youp can make it formal and repeatable. That might be very useful, indeed. Give the AI in the prompt an example of how its response should look like. It's referred to as "in context learning" and the AI will be able to follow your lead as you steer it towards success.

AI's Vision Role Across Fields

You might be protecting people from actual lions, but you can see how modern AI is well suited to this type of task. Use it! We're mapping the landscape of what's possible with this amazing technology. Adapt the lessons-learned for your application, industry, company, and users.

Let's brainstorm on how the techniques in this article might be applied to other industries and roles. Hopefully, this motivates you to think about leveraging Generative AI in your work.

User Experience: analyze screen caps from recordings of users interacting with software
Marketing: review photos of competitor products, extracting features, brand cues, and audience context
Q&A: detect before and after differences in user interface screenshots
Retail Display Feedback: monitor in-store displays assessing placement effectiveness to boost awareness and sales, A/B test locations
Real Estate: analyze listing photos identifying properly conditions
Architecture: compare and contrast foot traffic throughout the day and week
Education: monitor laboratory and technical workshop for student safety and following protocols
Tourism: review traveler photos from excursions understanding preferences to enhance future experiences and personalize recommendations

These ideas are meant to get you "thinking in AI" as a new way of working. None of them are guaranteed to work, but if you find them interesting, give it a try. Make a very small experiment to test. Find willing collaborators from your existing or prospective customers. Tell them you're not trying to sell them on anything. You just want to learn together.

Look at the results, and determine if it's worth pursuing for real.

LLMs Can See as Well as Read

I enjoyed writing this blog post. You've seen how Generative AI can analyze visual context. In this case, to hypothetically enhance workplace safety in an adventurous wildlife environment.

Using prompts, we can guide an AI to assess safety risks and suggest preventive measures for real-world scenarios. Multi-modal AI opens unique solutions to complicated problems combining images and text. Use it to empower, inform, and elevate human decision-making and safety oversight.

Share this post with your peers if you're inspired by the potential of Generative AI. It's a powerful tool to help us see more clearly and make better decisions. Excited to reimagine what's possible with you!

Reach out to me on X/Twitter or on LinkedIn, and let me know of your success. Let’s do something awesome together!