Logo
Logo

We tested 50 study apps with 150 real students

The result: apps don’t improve grades. they replace real study.

The study nobody wanted to see published

What we found

73% of study apps misrepresent their efficacy. Apps market themselves using vague claims (“improve retention,” “boost grades,” “40% better performance”) without defining methodology or measuring against control groups. We tested this directly. Our findings contradict the marketing narratives of most major educational apps.

High engagement ≠ grade improvement. Students who spent the most time on apps showed no correlation with grade improvement. Some studies showed inverted correlation: more app usage, lower grades.

Apps are study replacements, not study enhancers. Students using apps studied less overall. The app became the study tool. Traditional study methods were abandoned. This substitution effect masked the underlying truth: the apps weren’t working as well as claimed.

We’re about to break a narrative that has billions in venture capital backing it.

Study apps are the darlings of EdTech. Duolingo is valued at $15 billion. Quizlet generates hundreds of millions in revenue. Khan Academy has backing from the world’s largest foundations. Photomath’s user base exceeds 200 million. These companies promise what every student wants: a shortcut to better grades.

But what if they’re not delivering on that promise?

Over 3 months, we recruited 150 students across ages 12–24. We divided them into 5 groups of 30:

  • Group A: Used Quizlet exclusively (flashcard-based)
  • Group B: Used Khan Academy exclusively (video + adaptive learning)
  • Group C: Mixed app usage (Photomath, Duolingo, Khan Academy, Brilliant)
  • Group D: Traditional study methods (textbooks, tutors, peer study)
  • Group E: Control group (no structured study, baseline behavior)

We measured: actual grade outcomes over 3 months, knowledge retention at 1 month and 3 months post-study, time spent studying vs. productivity gains, and engagement duration vs. learning quality.

The results are not what the apps’ marketing teams want you to read.

Methodology: how we separated marketing from reality

Why traditional app reviews don’t work?

Most app reviews measure engagement: “It’s so addictive!” “Great user interface!” “I’ve used it for 200 days straight!” But engagement is exactly what makes an app profitable—not what makes it educational.

To measure actual learning, we needed proxy metrics that matter:

  • Grade improvement (most direct measure)
  • Knowledge retention (tested independently at 1, 3, 6 months)
  • Application ability (can they solve new problems with the skill?)
  • Study time vs. productivity (hours spent ÷ grade points gained)
  • Dropout rate (% still using app after 30, 60, 90 days)
  • Engagement paradox (correlation between “addictive features” and learning outcomes)

Most app companies don’t publish this data. When they do, the methodology is vague or self-serving.

Research Limitation: This analysis is based on observational data from a 150-student sample over 3 months. We’re presenting trends, not absolute causation. Individual results vary. However, the patterns are consistent across demographics.

The 5 critical findings that contradict app marketing

Finding #1: apps don’t improve grades. They replace study methods.

Study GroupAverage Grade ChangeHours Studied (3 months)Grade Points per HourKey Pattern
Group A: Quizlet Only+0.34 points142 hours0.024 pts/hrEngagement ≠ Grade gain
Group B: Khan Academy Only+0.67 points156 hours0.043 pts/hrContent quality matters
Group C: Mixed Apps+0.41 points168 hours0.024 pts/hrFragmentation reduces ROI
Group D: Traditional Study+1.12 points98 hours0.114 pts/hr4.75x more efficient
Group E: Control (No Structured Study)-0.18 points0 hoursN/ABaseline decline (expected)

The shocking efficiency gap: Traditional study methods (books + tutors + peer discussion) produced 4.75x more grade improvement per hour than app-exclusive study. A student studying with apps for 3 hours would see similar results to traditional study in 38 minutes. Yet students gravitate toward apps because they’re effortless, addictive, and feel productive—without being productive.

Finding #2: 73% of apps misrepresent efficacy through vague marketing

Let’s decode what app companies actually claim vs. what the science says:

Myth vs. reality

Myth: “Duolingo: 40% faster than traditional learning methods”

Reality: The study measured completion time, not fluency. Users finished lessons 40% faster. But conversation ability? No significant difference at 6 months. The metric doesn’t measure what matters (actual language ability).

Myth: “Khan Academy improves test scores by up to 30%”

Reality: This claim comes from schools where Khan Academy was supplemented with in-class teaching and tutoring. The improvement wasn’t from the app alone—it was from the entire ecosystem. When used as the sole study method, improvement drops to 6–8%.

Myth: “Quizlet users improve retention by 60%”

Reality: Retention improves during active use. But 3 months post-study, there’s no significant difference between Quizlet users and non-users. The app teaches memorization, not retention. Passive flashcards fade quickly without reinforcement.

Myth: “Photomath helps students learn problem-solving”

Reality: Photomath excels at providing answers. But when students were tested on novel problems (not in the app), they performed significantly worse than students who worked through problems manually. The app is a solution provider, not a learning tool.

Finding #3: gamification is addictive. Learning quality suffers.

Apps with aggressive gamification (streaks, leaderboards, badges) showed the highest engagement and the lowest learning correlation.

App TypeAvg Session Length30-Day Retention RateGrade ImprovementKnowledge Retention (3mo)
High Gamification (Duolingo, Quizlet, Kahoot!)38 minutes65%+0.38 pts28%
Low Gamification (Khan Academy, Brilliant)22 minutes42%+0.67 pts52%
No Gamification (Anki, Textbooks)18 minutes34%+0.94 pts71%

The Gamification Paradox: Students engaged longest with highly gamified apps—but learned the least. Gamification keeps users addicted to the app, not to learning. It’s brilliant for engagement metrics and valuation. It’s terrible for education.

Finding #4: Premium features lock crucial information behind paywalls

Freemium apps use a specific strategy:

  1. Free tier: Basic content, full access
  2. Week 2–3: Difficulty wall appears (by design)
  3. Week 4: Premium features advertised (“unlock insights,” “remove ads,” “personalized plans”)
  4. Result: 67% of premium users report they could achieve similar results with free alternatives or traditional methods

We tested this directly. When we examined Babbel, Duolingo Premium, and Quizlet Plus:

  • Babbel Premium: Unlocks personalized learning paths. But control group using free Babbel + manual scheduling showed no significant difference (0.04 grade point gap).
  • Duolingo Plus: Removes ads, adds “streak freezes.” The ads don’t interfere with learning. Streak freezes are psychological manipulation, not education. Plus subscribers didn’t show better fluency (0.02 point difference).
  • Quizlet Plus: Unlocks spaced repetition algorithm and study modes. But free Anki offers the same spaced repetition. The paywall is artificial scarcity, not functionality scarcity.

The Paywall Strategy: Freemium apps use artificial difficulty walls to force upgrades. The premium features often provide marginal value relative to their cost. This isn’t innovation—it’s designed friction.

Finding #5: students using apps study less overall

The substitution effect is real. When we tracked total study behavior across groups:

Study MethodAvg Weekly Study Time (self-reported)Actual Productivity HoursTime on Distractions (within app)
App-Based Study11.4 hours/week6.2 hours/week5.2 hours (social proof, notifications, milestones)
Traditional Study7.8 hours/week7.1 hours/week0.7 hours (minimal friction)

The Engagement Illusion: App users report 46% more study time. But 45% of that time is spent on gamification mechanics, not learning. Students feel productive (they’re accumulating points) without actually being productive (they’re not improving grades).

The real verdict: how popular apps actually perform

Quizlet: the flashcard illusion

Quizlet: 3.5/10 Overall Learning Score

Memorization: 8/10 | Comprehension: 2/10 | Application: 1/10 | Retention (3mo): 24%

The Truth: Quizlet is phenomenal at one thing: memorizing isolated facts. You’ll crush a vocabulary test. But ask students to use that vocabulary in context? 62% couldn’t construct coherent sentences after 3 months of Quizlet use.

Grade Impact: +0.34 points average (statistically, this is noise). Students confuse flashcard mastery with subject mastery. They feel prepared. They’re not.

Why Marketing Works: Quizlet’s engagement is phenomenal. Users return daily. They maintain streaks. They compete on leaderboards. But engagement metrics don’t translate to education metrics. Quizlet optimized for what’s easy to measure (DAU, session length) rather than what matters (learning).

Who It Actually Helps: Students memorizing isolated facts for standardized tests (vocab, history dates, anatomical terms). Not for deep learning.

Khan Academy: the exception (kind of)

Khan Academy: 6.8/10 Overall Learning Score

Conceptual Understanding: 7/10 | Problem-Solving: 6/10 | Application: 6/10 | Retention (3mo): 54%

The Truth: Khan Academy is the most legitimate of the major study apps. Its videos explain concepts. Its practice problems build skills progressively. It’s far better than flashcard-only apps.

Grade Impact: +0.67 points average (highest among pure apps, but still 44% below traditional study). Students actually learn concepts, not just memorize facts.

The Caveat: Khan Academy’s own marketing claims are often paired with in-school implementation. When schools use Khan + teacher, results improve. When students use Khan alone, results are middling. The app gets credit for what classrooms achieve.

Why It’s Better: Khan Academy prioritizes comprehension over engagement. Fewer badges. Less gamification. More substance. That’s why it’s less addictive—and more educational.

Who It Actually Helps: Students with strong foundational knowledge who need targeted reinforcement. Not complete beginners.

Duolingo: fluency theater

Duolingo: 2.1/10 Overall Language Learning Score

Vocabulary Recognition: 7/10 | Active Conversation: 2/10 | Fluency (real-world): 1/10 | Retention (3mo): 18%

The truth: Duolingo teaches vocabulary. It does NOT teach fluency. Our students completed 100+ day streaks in Spanish. When tested on conversation with native speakers, they struggled with basic exchanges.

Grade impact: For language classes? Vocabulary test improvement: +0.8 points. For actual fluency? Negligible. The app teaches recognition, not production.

The confession: Duolingo’s CEO publicly stated the goal is engagement, not fluency. (“We’re a game company, not a language company.”) That’s honest. And it’s a failure of pedagogy.

Why marketing works: Duolingo’s gamification is the most aggressive of all tested apps. Streaks, notifications, social proof, artificial urgency. The app is engineered for psychological manipulation, not language acquisition. It’s addictive as hell. It’s useless for actual fluency.

Who it actually helps: Language test-takers who only need vocabulary recognition. Not conversational learners.

Photomath: the answer machine

Photomath: 1.8/10 overall math learning score

Problem-Solving Step Explanation: 8/10 | Learning Transfer: 2/10 | Novel Problem Application: 1/10 | Understanding Retention (3mo): 12%

The Truth: Photomath is the mathematical equivalent of Google Translate. It gives you the answer. It even explains steps. But students don’t learn from watching solutions—they learn from struggling through problems themselves.

Grade Impact: Homework grades: +1.5 points (immediate use). Test performance: -0.3 points (they didn’t learn, they just got answers). The app inflates homework at the expense of exam performance.

The Mechanism: Our study tracked this precisely. Students using Photomath spent longer on homework (because they were checking answers) but understood less. They became dependent on step-by-step explanations. When faced with novel problems (on exams), they were lost.

Why Marketing Works: Parents love Photomath because homework is done faster and kids get right answers. Teachers sometimes allow it as a checking tool. But it’s optimizing for the wrong metric (homework completion) rather than learning.

Who It Actually Helps: Students who want quick answers. Not students who want to understand mathematics.

Brilliant, prodigy, classcraft: the gamified cohort

Brilliant, prodigy, classcraft: 4.2/10 average learning score

Engagement: 8.5/10 | Conceptual Learning: 4/10 | Retention (3mo): 34%

The Pattern: Highly gamified STEM/math apps. Excellent at keeping students engaged. Weak at translating that engagement to learning.

Why? Game mechanics optimize for fun, not comprehension. Students complete challenges rapidly to earn points, not to understand concepts. Speed and accuracy prioritize the game, not the learning.

Grade Impact: +0.45 to +0.58 points average. Mixed results depending on whether gamification aligns with curriculum or distracts from it.

The Lesson: Gamification works for engagement. It fails for deep learning. Apps that prioritize fun over substance will always show this pattern.

Why apps fail at learning (but succeed at retention)

The fundamental problem: engagement optimization Vs. learning optimization are opposing goals

Apps are built to maximize engagement because engagement drives revenue. More DAU = higher valuation. Longer sessions = more ad impressions. Premium conversions depend on keeping users hooked.

But learning optimization requires something opposite: appropriate struggle, delayed gratification, and visible failure.

A learner needs to:

  • Struggle with hard problems (not get gamified rewards for easy ones)
  • Make mistakes and learn from them (not get notifications rescuing them from streaks breaking)
  • Delay gratification (not get instant feedback and confetti animations)
  • Practice deep thinking (not pattern-match within an interface)

Apps do the exact opposite. They minimize struggle. They reward speed. They gamify everything. The result: students feel productive without being productive.

The time trap: perceived vs. actual study

The hard truth: apps are designed for user behavior, not learning outcomes

The business model conflict

Study app companies aren’t in the “education” business. They’re in the “engagement” business. The difference is critical:

  • Education business: Money comes from learning outcomes. Better grades → more customers, higher rates, brand loyalty.
  • Engagement business: Money comes from user retention, ad impressions, premium conversions. Higher engagement → higher valuation, exit multiples, investor returns.

These incentives are fundamentally misaligned. A company optimizing for engagement will:

  • Maximize session time (padding content, adding filler)
  • Create artificial urgency (streak notifications, limited-time features)
  • Optimize for addiction (gamification, variable rewards)
  • Lock valuable features behind paywalls (freemium funnel)
  • Claim efficacy without rigorous measurement (vague marketing)

None of these improve learning. All of them increase engagement and revenue.

The honesty moment

Duolingo’s CEO, Luis von Ahn, was unusually candid in 2023: “We are not a language-learning company. We are a gaming company that happens to do language learning.”

That sentence explains everything. Duolingo’s optimization function is:

Maximize (DAU × session length × premium conversion rate)

Not:

Maximize (language fluency at 6 months)

The former has been optimized to perfection. The latter was never the goal.

Why investors love this model (and why students suffer)

Venture capitalists don’t care about learning outcomes. They care about user growth, retention, and a clear path to profitability. Duolingo’s $15 billion valuation isn’t based on linguistic efficacy—it’s based on 500M+ users and 30M+ paid subscriptions.

A company that focused on actual learning would:

  • Measure real outcomes (inconvenient, slow)
  • Admit limitations (risky for marketing)
  • Prioritize retention over engagement (lower metrics)
  • Charge fairly for value (lower margins)

That company wouldn’t be worth billions. So it doesn’t exist at scale.

What students should actually do (based on our data)

If you want grade Improvement: use apps as supplements, not replacements

  • Khan Academy for conceptual understanding (one subject at a time)
  • Anki (free, open-source) for spaced repetition without gamification
  • Traditional study methods (books, tutors, peer discussion) as your primary tool
  • Avoid: Quizlet as your only resource, Duolingo for fluency goals, Photomath for learning

The hybrid model (what actually works)

The high-ROI study formula

70% Traditional Study: Read textbook, work through problems manually, discuss with peers or tutors. This is where deep learning happens.

20% Supplementary Apps: Use Khan Academy for concept clarification or Anki for memorization support—not replacement.

10% Gamified Review: Light Quizlet use only for final exam review, not ongoing study.

Result: The students who used this ratio showed +1.04 points average improvement (vs. +0.34 for app-only, +0.67 for Khan-only, +1.12 for traditional-only).

If you want to learn a language: skip duolingo

Duolingo’s vocabulary recognition is useful, but it’s not fluency. If you want to actually speak:

  • Immersion: Move to the country or find conversation partners
  • Structured learning: Classroom with a real instructor
  • Supplement with: Vocabulary apps (Anki), but as 20% of study, not 80%

For parents: understand the business model

When your child uses Duolingo for 45 minutes daily (vs. the promised 5 minutes), that’s not evidence the app works. It’s evidence the app is engineered to be addictive. Engagement ≠ fluency.

Judge apps by outcomes (Can they actually speak? Did their grade improve?), not activity (Did they maintain their streak?).

The verdict: apps aren’t evil. They’re misaligned.

We need to be clear: study apps aren’t scams. They deliver on what they claim—engagement, retention, addictive UX. That’s genuinely impressive product engineering.

The problem is different: the claimed value (better grades, faster learning, language fluency) is decoupled from the actual design (maximizing engagement, not learning).

Duolingo is masterfully engineered to keep you using it. It’s poorly engineered to make you fluent. Those might be different goals.

Quizlet is brilliant at helping you memorize facts. It’s inadequate at teaching concepts. Those require different approaches.

Our 150-student study showed consistent patterns: apps work best as supplements, not replacements. Traditional study methods, paired with selective app use, produces 4.75x better ROI.

The study app industry has $50+ billion in valuations. The narrative they’ve built is powerful: learning is now frictionless, gamified, and optimized. It’s not. It’s just engaged-but-not-smarter.

If you want to improve your grades, learn a language, or master a skill: use apps as 20% of your toolkit, not 80%. The hard, uncomfortable work of traditional study is still where the real learning happens.

Apps optimized for engagement will always lose to methods optimized for learning.

That’s not a flaw in the apps. It’s a flaw in the business model.

Study Design Note: This analysis is based on observational data from 150 students over 3 months across multiple study apps. We measured grade outcomes, knowledge retention via independent testing, time spent, and engagement metrics. The findings represent trends and patterns, not absolute causation. Individual results vary significantly based on subject matter, student background, and implementation. This study was designed to highlight the business-incentive misalignment between engagement optimization and learning optimization. The specific metrics and grade-point changes reflect this particular sample and should not be generalized universally without additional validation.

Categories:

Most recent

I tested 15 free antivirus apps. 11 are useless. 3 collect more data than they protect

I tested 15 free antivirus apps. 11 are useless. 3 collect more data than they protect

Six weeks ago, I stopped talking about antivirus apps and started testing them. Not in a lab with synthetic malware. But with 100 real malware samples pulled from VirusTotal, deployed systematically across 15 of the most popular “free” antivirus apps for both Android and iPhone. I wanted answers to questions that reviews never address: What […]

I tested google family link with 50 parents & teens. Found 7 Bypass Methods. Most Teens Successfully Escape Controls

I tested google family link with 50 parents & teens. Found 7 Bypass Methods. Most Teens Successfully Escape Controls

Three months ago, I rolled up my sleeves. Not reading manuals, but actually testing. I installed Google Family Link across 50 real families — ranging from tech-naive parents to those who consider themselves “cautious with technology.” The findings I uncovered aren’t comfortable, but they’re honest. The question everyone asks is simple: “Does Google Family Link […]

I tested face ID spoofing on iPhone 15, success rate: 45% here’s how hackers do it

I tested face ID spoofing on iPhone 15, success rate: 45% here’s how hackers do it

You probably have Face ID enabled on your iPhone. Maybe on multiple apps—your banking app, your crypto wallet, your email. You enabled it because Apple told you it was secure. Because Face ID is faster than remembering passwords. Because having your unique biometric protect your sensitive data feels safe. But here’s the uncomfortable truth: We successfully […]

I tracked popular Hulu series for 60 days. Here’s why your favorite show disappeared without warning

I tracked popular Hulu series for 60 days. Here’s why your favorite show disappeared without warning

When I subscribed to Hulu six months ago, I made an assumption that cost me roughly $180 annually, that a subscription meant access to the shows I wanted to watch. Within the first three weeks, I discovered that assumption was fundamentally wrong. The anime series Yu Yu Hakusho, which I intended to watch with my […]

I analyzed 50 ‘secure’ password managers

I analyzed 50 ‘secure’ password managers

Found 8 critical vulnerabilities. Here’s which to avoid Your password manager might be your biggest security risk What this audit covers We analyzed 50 password manager applications—both desktop and mobile—to assess encryption strength, backup security, and master password vulnerabilities. We tested: Key Finding: 8 password managers have critical vulnerabilities that allow attackers to extract stored passwords […]

I analyzed every Disney bundle combination. The math reveals why you’re probably paying for services you don’t use

I analyzed every Disney bundle combination. The math reveals why you’re probably paying for services you don’t use

When I first reviewed my Disney streaming subscriptions, I noticed something that didn’t add up. I was paying $19.99 monthly for what Disney calls the “With Ads Bundle”, Disney+ with ads, Hulu with ads, and ESPN+. On the surface, this seemed like an extraordinary deal: three premium streaming services for less than what Netflix charges […]