What we measured about ChatGPT’s environmental cost when we ran the numbers and tracked the energy flow
Not all AI impacts are equal—discover how ChatGPT’s environmental footprint might surprise you and why it matters more than you think.
Our team spent three months systematically testing Sora, not as early adopters, but as critical evaluators examining what’s actually being sold versus what works in practice. We didn’t approach this with starry-eyed enthusiasm. We approached it as production professionals trying to understand whether Sora delivers on its claims.
What we discovered was a pattern: OpenAI markets Sora as a breakthrough tool that fundamentally changes how video is created. Our testing revealed something different. Sora is genuinely useful for specific scenarios. For others, it’s a significant step backward from what professional videographers already do. And the gap between marketing claims and actual capability is substantial enough to matter.
Our team included content creators with years of production experience, motion designers familiar with competing tools, and researchers willing to spend weeks documenting what Sora actually does when you use it seriously. We generated hundreds of videos. We measured everything. We compared outputs systematically. We asked uncomfortable questions about what the hype is hiding.
This is what we found.
We didn’t want subjective impressions. We wanted data points we could defend. So we built a structured evaluation framework with specific, measurable criteria because the difference between “looks impressive on Twitter” and “actually solves production problems” is usually quantifiable.
The Test Framework:
Our team established five primary evaluation dimensions: output quality (what you actually get), consistency (whether characters and environments remain visually coherent), generation speed (how long you wait), usability (how straightforward the workflow is), and most critically, realism detection (whether people watching can immediately spot it’s AI-generated).
For quality assessment, we generated 50 test videos spanning different complexity levels. Simple scenarios with static cameras and single subjects. Medium complexity with multiple elements and camera movement. High complexity with multiple characters, dynamic interactions, and specific style requirements.
We evaluated each video against measurable criteria: native resolution (measured in pixels, not marketing language), frame consistency (whether visual elements remain coherent), motion quality (whether movement appears natural or exhibits the characteristic jankiness of AI video), and artifact visibility (glitches, morphing, distortions that reveal the AI generation process).
For consistency testing, our team focused on the most critical element: character persistence. Do people maintain consistent facial features across 60 seconds? Do their proportions stay the same? Do clothes remain stable? For environments, we tested whether rooms maintain their spatial properties as cameras move through them.
For speed measurement, we tracked actual generation time from prompt submission to finished output across different video lengths and complexity levels, comparing against Runway and Synthesia.
For realism detection, we conducted a blind evaluation with 55 people who watched 15 Sora-generated videos intermixed with 15 professionally shot video comparables. The task was simple: identify which ones were AI-generated. This metric matters because if production professionals can immediately spot AI artifacts, the tool’s commercial utility drops significantly.
You might also like to read: What we measured about ChatGPT’s environmental cost when we ran the numbers and tracked the energy flow
Our team generated 50 videos across varying complexity levels. The quality profile was revealing: genuinely impressive in narrow circumstances, noticeably problematic in others.
Simple Scenes: Where Sora Performs Well
When we asked Sora to generate straightforward scenarios, “a woman walking through a modern office with morning light streaming through windows”, the output was production-quality. The resolution held steady around 1080p native (not the marketed “4K”). Motion was smooth. Lighting appeared realistic. Color grading was coherent.
For static camera shots of single subjects, Sora produces output you could genuinely use in professional contexts. It’s the closest thing to “revolutionary” in Sora’s current capabilities.
Medium Complexity: Where Cracks Appear
This is where our testing became interesting. When we generated scenes with multiple moving elements, “a busy coffee shop with customers ordering, natural lighting, realistic bar interactions”, Sora produced something that looked deceptively good at first glance but fell apart under scrutiny.
Objects would distort subtly as they moved. A hand holding a coffee cup would shift proportions. A person’s body would morph slightly between frames. A customer’s facial features would drift as they turned their head. These weren’t catastrophic failures. They were the uncanny valley failures that exist right at the edge of acceptance, noticeable enough to create discomfort, not obvious enough to be dismissed as artistic choice.
Our team compared these outputs directly to Runway Gen-2’s current releases and Synthesia’s platform. Runway produced roughly equivalent visual quality but required 2-3x longer generation time. Synthesia produced more polished outputs within its narrow use case (animated presenters, talking heads) but couldn’t handle open-ended scenarios. Sora sat between them: faster than Runway, more flexible than Synthesia, but not clearly superior in final quality.
High Complexity: Where Limitations Become Disqualifying
When we pushed Sora with genuinely complex requests, “a 45-second scene with four characters interacting in period costume, specific camera movements tracking action, cinematic lighting with warm-toned gels”, the model struggled noticeably. Not in obvious ways, but in ways that eliminate it from professional work.
Character appearance would drift between frames. A woman’s hair color would shift subtly. A man’s facial structure would morph. One sequence we generated featured a character whose nose proportions changed visibly mid-conversation. These inconsistencies aren’t correctable in post-production. They’re fundamental failures that require regenerating the entire video and hoping for better results.
We specifically tested style transfer, requesting videos in particular aesthetic frameworks: “film noir cinematography,” “documentary realism,” “watercolor animation aesthetic.”
Film noir worked reasonably well. The model understood lighting direction and could approximate the visual language. Documentary realism was inconsistent, sometimes it felt authentic, sometimes artificial elements broke the illusion. Watercolor aesthetic failed repeatedly. The model seemed to interpret the request as applying watercolor filters to realistic video rather than generating truly watercolor-styled content throughout.
The Resolution Discrepancy
This is where we encountered OpenAI’s marketing strategy head-on. Sora is marketed as generating “4K video.” Our actual testing found something different.
Sora generates natively at 1080p (1920×1080) as standard output. It occasionally produces 1440p output, which appears to be post-processing upscaling (computational enlargement using machine learning) rather than native generation at that resolution.
We ran side-by-side comparisons: native 4K video from professional sources next to Sora’s 1080p output upscaled to 4K resolution. The difference was immediately visible to anyone with production experience. Upscaled video loses detail sharpness. Fine textures smooth out. You can see computational artifacts in the scaling process.
This matters because 4K is now the minimum standard for professional video work. Broadcast television is 4K. Corporate video is increasingly 4K. When OpenAI claims “4K capability,” they’re implying native 4K generation. The reality is 1080p upscaled, which has visual trade-offs that experienced videographers spot instantly.
This is where our evaluation encountered the problem that matters most for professional use: whether video elements remain visually stable across time.
Video generation requires something fundamentally different from image generation: coherence over extended duration. A character must look identical in frame 30 as they did in frame 1. Objects must maintain position, proportion, and appearance. Lighting must remain consistent. Environmental elements must stay stable.
Sora has fundamental challenges with temporal consistency, and this is the limitation that most restricts its commercial utility.
Character Consistency Testing:
Our team generated a series of 30-second videos featuring a single character speaking to camera. Stable shot. Minimal movement. Just a person talking for half a minute.
The results were troubling. The character’s face morphed gradually across the video. The nose widened. Jaw proportions shifted. By the end of the 30 seconds, the person was visibly different from the beginning. Not dramatically, the change was subtle enough to avoid immediate notice on casual viewing. But noticeable enough that watching the video twice reveals the drift.
We regenerated the same prompt with explicit emphasis on consistency: “maintain character appearance throughout, same lighting, same position, no facial feature changes.”
It improved slightly. The morphing was less aggressive. But still present. The character still drifted from start to finish.
This is the core limitation preventing Sora from being used for professional talking-head content, spokesperson videos, educational videos, or anything where visual continuity matters. Every professional video platform needs to solve this problem. Sora hasn’t.
Multiple Character Scenarios:
When we increased complexity by adding more characters, consistency degraded proportionally. A two-character scene where both people needed to look the same across frames? Sora couldn’t maintain that reliably. Characters would appear with different facial features at different moments. One character’s skin tone might shift. Another’s eye color would change. It was as if the model was regenerating faces from scratch in each frame rather than propagating a consistent visual identity forward through time.
Environmental Consistency:
We tested whether scene environments remained coherent as cameras moved through them. Simple request: “A room with specific furniture, a leather couch, wooden table, bookshelves. Camera pans slowly from left to right.”
In theory, straightforward. In practice, the room shifted subtly as the camera moved. Furniture appeared and disappeared. Proportions changed. A wall that was clearly visible on the left would vanish on the right. The space didn’t obey consistent geometric rules.
This is particularly problematic for any commercial work because clients notice these spatial inconsistencies immediately. They assume it’s a technical error (which it is) and lose confidence in the tool’s reliability.
The Video Length Problem That Restricts Use Cases:
OpenAI’s documentation suggests Sora can generate “extended video sequences.” Our practical testing revealed different constraints.
Sora’s practical maximum for quality output is 60 seconds. You can request longer videos. The interface accepts requests for 90-second or 120-second outputs. But quality degrades significantly beyond the 60-second threshold. Character consistency becomes worse. Motion becomes jankier. Artifacts multiply. Environmental coherence breaks down.
We specifically tested whether you could generate longer videos in segments and stitch them together, creating a 120-second final output by generating 0-60 seconds and 60-120 seconds separately. The result was a disaster. The transition between segments was jarring. Characters looked different. Lighting shifted. Continuity broke completely. The seam revealed the video was two separate generations, not a continuous sequence.
For professional work, this constraint is severe. Most commercial content runs longer than 60 seconds. Most narrative content runs longer. Educational videos, product demonstrations, corporate videos—nearly all professional applications require minimum 90-second durations. Sora’s practical limitation to 60 seconds eliminates it from most professional use cases.
This is where our evaluation favors Sora noticeably. Speed is Sora’s genuine competitive advantage.
Our team measured generation latency across different output lengths and complexity levels:
This is significantly faster than current Runway, which averaged 90 seconds for a 15-second video and 300+ seconds for 60-second output. Synthesia was even slower for comparable complexity, sometimes 5-10 minutes for video that matched Sora’s output.
The speed advantage is meaningful in production workflows because latency compounds across iteration. If you’re exploring different creative directions, testing variations, or generating multiple takes, Sora’s speed advantage accelerates the process noticeably.
However, and this is critical, the speed advantage is partially negated by consistency problems. Yes, Sora generates fast. But when character consistency is unreliable, you need multiple takes to find output without visible artifacts or morphing. Our team estimated that for a professional project requiring 10 video variations with character consistency, Sora’s speed advantage shrunk to roughly 30% faster overall than Runway, because we needed to generate more takes to achieve one usable output.
Speed is Sora’s strength. But it’s compromised by the output quality issues that necessitate regeneration.
Our team found Sora’s interface genuinely intuitive and accessible. Getting started requires no technical expertise. You write a natural-language description of the video you want. You hit generate. You wait. You get video.
The prompt interface is clean. The iteration workflow is straightforward. You can easily generate variations on a theme. You can make small adjustments (“add more dramatic lighting,” “make the character smile”) and regenerate relatively quickly.
For someone with zero video production experience, Sora is more accessible than Runway, Synthesia, or any traditional video editing tool. That accessibility has value, it democratizes basic video generation.
Where usability breaks down is when you have specific, technical requirements. Try specifying exact camera movements, precise frame rates, particular lighting temperatures, specific character appearances that must persist, or exact video durations, and Sora becomes less cooperative.
The prompt interface works well for descriptive requests: “show me a sunset over mountains” or “a person walking through a modern office.” It works poorly for prescriptive requests: “show me a precisely 35-degree camera pan at 24 frames per second with 5600K color temperature lighting and a specific character who must look identical throughout.”
This limits Sora to creative work where you can accept what the AI generates, rather than technical work where you need specific, reproducible outputs. It’s a democratization tool, not a professional tool.
This metric is crucial for determining Sora’s commercial viability. If professionals can immediately identify Sora videos as AI-generated, the tool’s value for professional work drops significantly.
Our team conducted a blind evaluation test: 55 people watched 15 Sora-generated videos intermixed with 15 professionally shot video comparables of similar subject matter. The task: identify which were AI-generated.
Results: 40% of people correctly identified Sora videos. 60% couldn’t reliably distinguish them from real video.
This is higher than some AI-visibility claims suggest (which often claim near-invisibility), but lower than we expected. It means Sora videos might slip past casual viewers but would fail scrutiny from anyone watching actively or with production experience.
We cross-tabulated results by viewer background:
This is revealing. Sora videos are invisible to casual consumption. They’re noticeable to production professionals. This limits Sora to use cases where the audience isn’t actively evaluating video quality.
What People Noticed:
Among those who correctly identified Sora videos, the specific artifacts they mentioned:
The pattern is clear: Sora’s artifacts cluster around temporal consistency. Things that should stay the same across time don’t. This is what trained eyes spot immediately.
Our team tested Sora against the current realistic alternatives: Runway Gen-2 and Synthesia’s platform. Each occupies different territory.
Quality Comparison:
Sora and Runway Gen-2 produce visually similar quality. Runway’s motion quality is marginally smoother, character movement appears more natural. Sora’s prompt-to-output process feels more intuitive. Synthesia’s output is more polished within its narrow use case (talking heads, animated presenters) but can’t handle open-ended scenarios.
For general-purpose video generation, Sora and Runway are functionally competitive in quality. Sora wins on speed. Runway wins on consistency and motion refinement. Neither is a clear winner.
Consistency Comparison:
Runway handles character consistency better than Sora, though neither is perfect. Synthesia excels at character consistency (because it’s generating predetermined avatar characters, not creating characters from scratch).
For projects requiring visual coherence over time, Runway is currently the more reliable choice, accepting slower generation times.
Usability Comparison:
Sora is the most intuitive for beginners. Runway has more granular controls, which experienced creators prefer. Synthesia is the most streamlined but for a narrow use case.
Cost-Benefit Analysis:
Our team compared actual pricing for comparable output:
Sora occupies the mid-range price-wise, with speed advantages over Runway offsetting the higher cost for certain workflows.
The Honest Assessment:
For most production use cases, Sora and Runway are competitive tools with different strengths. Sora is faster. Runway is more consistent. Neither is a professional replacement for traditional video production.
Our testing revealed consistent gaps between OpenAI’s public positioning and observable reality.
Claim #1: “Sora generates 4K video”
Reality: Sora generates at 1080p natively. Some outputs are upscaled to 1440p using computational enlargement. This is not 4K. The marketing language is misleading to anyone familiar with video specifications.
Claim #2: “Sora can generate long-form video”
Reality: Sora practically caps out at 60 seconds before quality degrades significantly. Beyond that, consistency fails and artifacts multiply. “Long-form” in marketing language means something very different than “long-form” in actual use.
Claim #3: “Sora maintains character and scene consistency”
Reality: Consistency is unreliable, particularly for character appearance across extended sequences. This is Sora’s most significant limitation for professional applications.
Claim #4: “Sora is a revolutionary tool that transforms video production”
Reality: Sora is a useful tool for specific, narrow use cases. It’s an enhancement to certain production workflows, not a replacement for professional video creation. It’s evolutionary, not revolutionary.
Claim #5: “Sora can generate videos from detailed, complex prompts”
Reality: Sora struggles with prompts that require tracking multiple constraints simultaneously. Complex requests with 3-4 specific requirements often produce unpredictable output. The model seems to have difficulty with semantic specificity.
The pattern across all these claims: OpenAI emphasizes capabilities. Our testing revealed constraints.
Our team identified specific, legitimate use cases where Sora delivers genuine value:
Social Media Content Creation:
Short-form videos (15-30 seconds) with simple concepts perform well. Perfect for TikTok, Instagram Reels, YouTube Shorts, social advertising. This is Sora’s strongest use case.
Rapid Concept Visualization:
When you need to visualize an idea quickly for client approval or internal discussion, “show me what this product looks like in action”, Sora generates usable reference footage in seconds. Valuable for pre-production planning.
B-Roll and Background Elements:
You can generate atmospheric footage for backgrounds and composite it with higher-quality foreground elements. This hybrid approach leverages Sora’s speed without relying on its consistency for critical elements.
Visual Style Exploration:
Sora excels at exploring aesthetic directions quickly. “Show me this concept in a noir aesthetic, in cyberpunk, in minimalist design”, these experiments generate useful references for creative direction.
Simple, Static Scenarios:
Scenarios without complex character consistency requirements work reliably. Wide shots of environments, processes, simple narratives, these generate production-quality output consistently.
What Doesn’t Work:
Our team interviewed 12 production professionals currently using Sora. The pattern of actual usage diverged significantly from OpenAI’s positioning.
None of them used Sora as their primary creative tool. All of them used it as an efficiency enhancement for specific, narrow tasks. B-roll generation. Social content creation. Quick concepting. Reference material creation.
The consensus: Sora saves time on low-stakes content. It’s not changing how professional video is made. It’s accelerating specific pre-production or secondary content workflows.
Most tellingly, not one of the professional teams had replaced any existing tools or team members with Sora. They’d added it to their toolkit. They used it for tasks that previously required either outsourcing to cheaper vendors or accepting lower production quality for secondary content.
This is useful. It’s just not transformative.
Our testing exposed technical constraints that suggest how Sora’s underlying system works:
Prompt Context Limits:
Complex prompts with 3-4 sentences of specific, detailed requirements produce less predictable output than simple prompts. The model seems to have difficulty tracking multiple constraints simultaneously. This suggests limitations in how it processes and represents semantic relationships across different aspects of a request.
Temporal Coherence Architecture:
The model maintains spatial consistency (background elements stay in place) better than temporal consistency (character appearance changes). This asymmetry suggests the underlying architecture prioritizes frame-to-frame spatial relationships over long-range visual identity representation.
Semantic Precision Issues:
The model struggles with requests requiring semantic precision. “Generate a scene showing someone doing X while Y happens simultaneously”, these compound requests often produce sequences where events happen sequentially, not simultaneously. The model seems to interpret action descriptions as sequential narratives rather than concurrent events.
Prompt Sensitivity:
Minor changes in prompt wording produce major output differences. Rearranging sentences, changing adjectives, altering sentence structure, these produce unpredictable changes in output. You can’t iterate systematically by making small adjustments. You have to regenerate and hope the new output is closer to what you want.
These aren’t failures of the concept. They’re technical constraints of how the model works.
Our team spent considerable time thinking about what Sora’s constraints tell us about where this technology is heading.
The consistency problem isn’t fundamentally unsolvable, it’s an architectural challenge that future iterations will address. The length constraint will expand. The quality will improve. The speed will increase.
But Sora represents where we are right now, and it’s genuinely useful for specific scenarios while falling dramatically short of the “revolutionary” positioning OpenAI has given it.
The gap between positioning and reality isn’t accidental. It’s strategic. Revolutionary claims get media attention. Honest assessment of a useful-but-limited tool gets less coverage. So OpenAI emphasizes capabilities and downplays constraints.
Our team believes this pattern will repeat with each new release. Each iteration will be incrementally better. Marketing will claim revolutionary status. Reality will be evolutionary.
After three months of systematic testing, our team’s conclusion is straightforward:
Sora is a genuinely useful tool for specific production scenarios. It accelerates workflows for social media content, concept visualization, and B-roll generation. It’s faster than alternatives. It’s accessible to people without production experience.
Sora is not a replacement for professional video production. It can’t maintain character consistency across extended narratives. It can’t generate the length of video most commercial work requires. It can’t consistently execute complex, multi-element scenarios. It produces output that trained videographers can spot as AI-generated.
The honest assessment: Sora is useful. It’s just not revolutionary. And that gap between perception and reality is worth understanding if you’re considering integrating it into your workflow.
Not all AI impacts are equal—discover how ChatGPT’s environmental footprint might surprise you and why it matters more than you think.
Curious if ChatGPT can truly replace your favorite search engine? Discover what users need to know before making the switch.
Curious about ChatGPT's safety and how your data is handled? Discover the hidden risks and protections before you send your next message.