Reddit Early Engagement Dynamics

Jan 1, 2026 min read

Structured Visibility Reporting Under Snapshot-Based Collection Constraints

Objective

Build a structured, audit-focused reporting workflow that measures early post visibility within defined r/AskReddit listings using public snapshot data collected at fixed intervals.

The project defines measurement boundaries explicitly and restricts conclusions to what can be directly observed under known collection limits.

No causal inference or ranking-model assumptions are introduced.


Dataset Context

  • Subreddit: r/AskReddit
  • Collection window: 7 consecutive days
  • Cadence: ~15-minute intervals
  • Listings captured: /new, /hot, /rising

Reddit does not provide listing history or impression data.

All measurements reflect observed appearances in public snapshots only.

Unobserved time is treated as non-inferable.


Reporting Structure & Measurement Controls

The workflow defines reporting grain and segmentation rules before aggregation.

Unit of Analysis

Primary unit:

(post_id, segment_id)

A segment represents an uninterrupted collection period.

If a gap exceeds 60 minutes, a new segment_id begins.

No analysis crosses segment boundaries.


Segmentation Rule

  • Hard gap threshold: 60 minutes
  • Gaps >60 minutes create new segment_id
  • No stitching across gaps
  • Run-level completeness explicitly recorded

This preserves denominator discipline and prevents continuity assumptions.


Analysis Table Architecture

All reporting derives from three structured tables built from raw snapshots.

All summary metrics are computed directly from structured tables and reconcile to raw snapshot counts within uninterrupted segments.

1. run_level

Primary key: run_id

Defines:

  • Collection cadence
  • Segment boundaries
  • Gap classification

2. post_level

Primary key: (post_id, segment_id)

Aggregates:

  • Chronological visibility in /new
  • Number of observed appearances
  • Initial comment state

3. ranked_intersections

Primary key: (post_id, segment_id)

Records:

  • Observed intersections with /hot
  • Observed intersections with /rising
  • Snapshot interval between first /new and ranked appearance

Ranked intersections are lower-bound measurements due to top-N truncation.


Reporting Outputs

Collection Coverage & Gaps

Shows uninterrupted segments and hard gaps across the study window.

All downstream reporting is restricted to uninterrupted periods.

Collection Coverage & Gaps


Observation Depth in /new

Displays how many snapshots each post appears in /new.

Median observed appearances: 27 snapshots.

A large single-observation spike appears in the distribution.

Observation Depth in /new


Ranked Listing Exposure

Approximate observed rates:

  • ~13% of observed posts appear in /hot
  • ~1-2% appear in /rising

These represent lower-bound exposure rates.

Ranked listings represent an infrequent observation surface relative to chronological visibility.

Ranked Listing Exposure


Observed Lag to /hot

Among posts that appear in /hot, promotion typically occurs within 1-2 snapshot intervals (~15-30 minutes).

Intervals reflect snapshot cadence only.

Observed Lag to /hot


Structured Audit: Observation Depth Spike

The distribution spike was audited before interpretation to confirm it reflected collection mechanics rather than processing error.

Audit steps included:

  • Recomputing distributions from raw snapshots
  • Identifying boundary-related variance near collection gaps
  • Comparing singleton vs multi-observed posts
  • Evaluating churn under normal cadence

Result:

The spike reflects a mixture of:

  • Fast-loss posts
  • Gap-adjacent observations
  • Normal queue variation

This audit ensured anomaly investigation prior to drawing conclusions.


Reporting Highlights (Constraint-Bounded)

Within uninterrupted segments:

  • Many posts receive comments while still visible in /new.
  • Ranked-surface exposure is rare relative to chronological visibility.
  • Ranked promotion, when observed, typically occurs early.

All highlights reflect observed appearances only.

No inference is made about ranking algorithms, impressions, or total exposure.


Limits & Interpretation Boundaries

  • No causal inference.
  • No ranking algorithm modeling.
  • No cross-segment stitching.
  • No impression measurement.
  • Ranked appearance rates are lower bounds.
  • No lifetime engagement estimates.
  • Unobserved time treated as non-inferable.

This project documents observable properties under explicit collection constraints.


What This Demonstrates

  • Explicit unit-of-analysis definition prior to aggregation.
  • Hard segmentation rules preventing continuity assumptions.
  • Separation of run-level, post-level, and ranked-intersection reporting layers.
  • Lower-bound interpretation discipline.
  • Audit-driven anomaly validation.
  • Reproducible, table-first reporting workflow.

The result is a structured visibility report built under fixed observational constraints, designed to prevent over-interpretation of incomplete data.


Repository

https://github.com/wguDataNinja/reddit-early-dynamics