Structured Visibility Reporting Under Snapshot-Based Collection Constraints
Objective
Build a structured, audit-focused reporting workflow that measures early post visibility within defined r/AskReddit listings using public snapshot data collected at fixed intervals.
The project defines measurement boundaries explicitly and restricts conclusions to what can be directly observed under known collection limits.
No causal inference or ranking-model assumptions are introduced.
Dataset Context
- Subreddit: r/AskReddit
- Collection window: 7 consecutive days
- Cadence: ~15-minute intervals
- Listings captured: /new, /hot, /rising
Reddit does not provide listing history or impression data.
All measurements reflect observed appearances in public snapshots only.
Unobserved time is treated as non-inferable.
Reporting Structure & Measurement Controls
The workflow defines reporting grain and segmentation rules before aggregation.
Unit of Analysis
Primary unit:
(post_id, segment_id)
A segment represents an uninterrupted collection period.
If a gap exceeds 60 minutes, a new segment_id begins.
No analysis crosses segment boundaries.
Segmentation Rule
- Hard gap threshold: 60 minutes
- Gaps >60 minutes create new
segment_id - No stitching across gaps
- Run-level completeness explicitly recorded
This preserves denominator discipline and prevents continuity assumptions.
Analysis Table Architecture
All reporting derives from three structured tables built from raw snapshots.
All summary metrics are computed directly from structured tables and reconcile to raw snapshot counts within uninterrupted segments.
1. run_level
Primary key: run_id
Defines:
- Collection cadence
- Segment boundaries
- Gap classification
2. post_level
Primary key: (post_id, segment_id)
Aggregates:
- Chronological visibility in
/new - Number of observed appearances
- Initial comment state
3. ranked_intersections
Primary key: (post_id, segment_id)
Records:
- Observed intersections with
/hot - Observed intersections with
/rising - Snapshot interval between first
/newand ranked appearance
Ranked intersections are lower-bound measurements due to top-N truncation.
Reporting Outputs
Collection Coverage & Gaps
Shows uninterrupted segments and hard gaps across the study window.
All downstream reporting is restricted to uninterrupted periods.

Observation Depth in /new
Displays how many snapshots each post appears in /new.
Median observed appearances: 27 snapshots.
A large single-observation spike appears in the distribution.

Ranked Listing Exposure
Approximate observed rates:
- ~13% of observed posts appear in
/hot - ~1-2% appear in
/rising
These represent lower-bound exposure rates.
Ranked listings represent an infrequent observation surface relative to chronological visibility.

Observed Lag to /hot
Among posts that appear in /hot, promotion typically occurs within 1-2 snapshot intervals (~15-30 minutes).
Intervals reflect snapshot cadence only.

Structured Audit: Observation Depth Spike
The distribution spike was audited before interpretation to confirm it reflected collection mechanics rather than processing error.
Audit steps included:
- Recomputing distributions from raw snapshots
- Identifying boundary-related variance near collection gaps
- Comparing singleton vs multi-observed posts
- Evaluating churn under normal cadence
Result:
The spike reflects a mixture of:
- Fast-loss posts
- Gap-adjacent observations
- Normal queue variation
This audit ensured anomaly investigation prior to drawing conclusions.
Reporting Highlights (Constraint-Bounded)
Within uninterrupted segments:
- Many posts receive comments while still visible in
/new. - Ranked-surface exposure is rare relative to chronological visibility.
- Ranked promotion, when observed, typically occurs early.
All highlights reflect observed appearances only.
No inference is made about ranking algorithms, impressions, or total exposure.
Limits & Interpretation Boundaries
- No causal inference.
- No ranking algorithm modeling.
- No cross-segment stitching.
- No impression measurement.
- Ranked appearance rates are lower bounds.
- No lifetime engagement estimates.
- Unobserved time treated as non-inferable.
This project documents observable properties under explicit collection constraints.
What This Demonstrates
- Explicit unit-of-analysis definition prior to aggregation.
- Hard segmentation rules preventing continuity assumptions.
- Separation of run-level, post-level, and ranked-intersection reporting layers.
- Lower-bound interpretation discipline.
- Audit-driven anomaly validation.
- Reproducible, table-first reporting workflow.
The result is a structured visibility report built under fixed observational constraints, designed to prevent over-interpretation of incomplete data.