Reddit Early Engagement Dynamics

Project Summary

This project analyzes early post visibility in r/AskReddit using a seven-day dataset of repeated public listing snapshots collected at approximately 15-minute intervals.

Because collection includes gaps, the dataset is segmented into uninterrupted observation periods using a 60-minute gap threshold. The unit of analysis is the post-segment (post_id, segment_id).

The goal is descriptive: measure what can actually be observed about early visibility and ranked-surface intersections under snapshot-based collection constraints.

Problem Framing

r/AskReddit is one of Reddit’s highest-volume communities, with new posts appearing continuously throughout the day.

Reddit does not expose listing history or impression data. Public APIs provide only the current state of listings such as /new, /hot, and /rising.

When data is collected through snapshots, any time between runs is unobserved. Posts may appear and disappear between snapshots, and ranked surfaces are truncated to top-N positions.

Many engagement analyses implicitly assume continuous visibility. This project instead treats observability itself as the object of measurement. All findings are restricted to what can be directly observed within known collection limits.

The study does not attempt to infer total exposure, ranking logic, or lifetime performance.

Dataset and Observation Model

Collection window: 7 consecutive days
Cadence: approximately every 15 minutes
Subreddit: r/AskReddit
Listings captured: /new, /hot, /rising

Data processing was performed in Python using pandas, with SQL used to construct analysis tables and aggregations.

Segmentation model

A hard gap threshold of 60 minutes defines uninterrupted collection segments.

If the time between runs exceeds 60 minutes, a new segment_id begins.
No analysis crosses segment boundaries.
Unobserved time is treated as non-inferable.

Unit of analysis

The analytical unit is: (post_id, segment_id) Each uninterrupted observation period is treated as independent. A post seen before and after a hard gap is not stitched across segments.

One run equals one observation opportunity.

This makes coverage limits explicit and avoids assumptions about continuity.

Figure 1 - Collection coverage and gaps

Figure 1 - Collection coverage and gaps
Timeline of snapshot runs across the study window, showing uninterrupted segments and hard gaps.
Analysis is restricted to uninterrupted periods where coverage is known.

Analysis Tables

All findings are derived from three structured tables built from raw snapshots. Separating run-level, post-level, and ranked-intersection data allows coverage, visibility, and ranked appearance to be analyzed independently.

run_level

Primary key: run_id
Defines the run universe, cadence, and segmentation boundaries.

post_level

Primary key: (post_id, segment_id)
Aggregates /new observations into structured visibility records.

ranked_intersections

Primary key: (post_id, segment_id)
Links chronological visibility to observed appearances in /hot and /rising.

Ranked intersections are lower bounds due to top-N truncation.

Key Findings

Visibility in `/new`

Most posts persist across multiple observations in /new.
The median post appears in 27 snapshots, corresponding to several hours of observed chronological presence.

The distribution also shows a large spike at a single observation. Investigation indicates this reflects a mixture of:

fast-loss posts
posts observed near collection gaps
normal variation in chronological visibility

All measurements reflect observed appearances only. Posts may have existed outside observation windows.

Figure 2 - Observation depth in /new

Figure 2 - Observation depth in /new
Distribution of how many snapshots each post appears in /new during uninterrupted segments.
The singleton spike reflects fast-loss posts, gap-adjacent observations, and normal queue variation.

Engagement During Early Visibility

Many posts receive comments while still visible in /new, even without ranked-surface exposure.

A majority of observed posts already have comments present at their first captured appearance. Among posts initially observed with zero comments, many accumulate comments during subsequent observations while they remain visible.

This suggests chronological visibility alone is often sufficient for some engagement in high-volume communities, even when posts never appear in ranked listings.

These measurements reflect observed engagement only, not total lifetime interaction.

Ranked Surface Scarcity

Only a small fraction of observed posts appear in ranked listings.

Approximately:

13% of observed posts appear in /hot
1-2% appear in /rising

These rates are lower bounds due to ranked top-100 truncation and snapshot cadence.

Appearance in /rising is not required for appearance in /hot. The two surfaces are not nested in the observed data.

This suggests ranked listings function as selective attention surfaces rather than extensions of chronological visibility.

No claim is made about algorithmic behavior or post quality.

Figure 3 - Ranked surface scarcity

Figure 3 - Ranked surface scarcity
Fraction of observed posts that appear in ranked listings.
Ranked surfaces are structurally rare relative to chronological visibility.

Timing of Promotion

Among posts that reach /hot, promotion usually occurs within one or two snapshot intervals (approximately 15-30 minutes).

The lag distribution is strongly front-loaded, with few delayed promotions.

These intervals reflect snapshot cadence and do not imply algorithm timing precision.

Figure 4 - Cadence to /hot

Figure 4 - Cadence to /hot
Snapshot intervals between first appearance in /new and /hot.
Observed promotions typically occur early within the visibility window.

Audit: Observation Depth Investigation

The large spike at one appearance triggered a structured audit.

The investigation proceeded by:

recomputing the distribution from raw snapshots
testing posts observed near collection gaps
examining churn under normal cadence
comparing singleton and multi-observed posts

The spike reflects a mixture of fast-loss posts, gap-adjacent observations, and normal queue variation.

This audit ensured that patterns in the distribution were examined before interpretation.

Limits and Interpretation Boundaries

No causal inference is made.
No ranking algorithm modeling is attempted.
No cross-segment stitching occurs.
Ranked appearance rates are lower bounds.
No lifetime engagement measurement is performed.
Unobserved time is treated as non-inferable.

This study documents observable properties of listing behavior under known data constraints.

It demonstrates structured dataset construction, segmentation under gaps, and disciplined interpretation within observational limits.