Ghostwriter

Reach is earned in dwell time: encoding LinkedIn's feed rules into a posting tool

Shipped

ghostwriter writes LinkedIn posts in my voice and publishes them after I approve. v0.3.0 does three things. It bakes a researched, sourced reach playbook into the skill as config the tool reads on every run, instead of advice I try to remember. It adds a flow diagram card type. And it ships end-to-end carousel support: a renderer that screenshots HTML slides and stitches them into a PDF, plus a publisher path that uploads that PDF through LinkedIn’s documents API, which LinkedIn then renders as a swipeable carousel in the feed. I verified the whole path against the real API. As usual I kept treating the skill like code, not a prompt: 121 tests at 100% coverage, shellcheck clean.

The reach rules are concrete. Put the hook up top, before the “see more” fold (~210 chars on desktop, less on mobile). Keep the body around 900 to 1500 characters. Optimize for saves over likes. Never put an external link in the post body. Run a “golden hour” engagement routine right after posting. The part worth writing about is why those rules are true, and then the small chain of browser and API plumbing that makes a native carousel possible at all. This is the end-to-end build: the playbook as data the generator reads, the slide-to-PDF render, the upload, and the verification gate.

What the feed actually measures: dwell time, not clicks

The single most useful thing to understand about LinkedIn ranking is that how long people spend on your post is a first-class quality signal, weighted heavily alongside explicit actions like a tap on “like.” LinkedIn’s own engineering team published this: they rank partly on dwell time because it’s measurable on every viewed post including passive reads, it’s a real-valued signal instead of the binary yes or no of a like, and it dodges the “click bounce” problem where someone opens a thing and immediately backs out (LinkedIn Engineering: Understanding dwell time). They model the likelihood that a post is skipped quickly and down-rank posts likely to be abandoned.

Once you accept that, the rest of the rules stop looking like folklore. The hook goes near the top because the opening slice is what has to earn the “see more” tap, and the tap is the first chunk of dwell time. The exact fold is an observed heuristic that varies by client: I use ~210 characters as a desktop figure, but mobile truncates earlier (often around 140), so treat the number as a knob, not a law. Posts land in the 900 to 1500 character range because they need to be long enough to hold attention but short enough to finish. “Optimize for saves over likes” is really “optimize for the signal that means this was worth coming back to,” which is a far stronger proxy for value than a reflexive like. The “golden hour” routine falls out of the same model: early engagement is the test LinkedIn uses to decide whether to widen distribution, so a short burst of genuine replies right after posting is the difference between a post that travels and one that quietly stalls.

The “never put a link in the body” rule has the same root. The feed is optimized to keep people on the platform, so a post that sends them off it gets deprioritized; LinkedIn’s marketing guidance says as much and recommends putting the link in the first comment instead (LinkedIn: Do links lower post reach). An outbound link is a dwell-time leak. None of this is a trick; it’s aligning what you publish with what the ranker can actually see.

The playbook as data the generator reads

Here’s the design decision I care about most. All of the above lives in config the skill loads on every run, rather than in my head or buried in a prompt I rewrote once and forgot. The research is encoded as machine-readable rules the tool applies automatically, each with the source that justifies it, so the rules stay auditable. When LinkedIn changes how the feed behaves, I edit one file and every future draft inherits the new rule.

# voice/playbook.py
# Each rule carries the research that justifies it, so the config is auditable:
# open the file, read the claim, follow the source, challenge it if it's stale.
from dataclasses import dataclass, field

@dataclass(frozen=True)
class ReachPlaybook:
    hook_max_chars: int = 210        # observed desktop "see more" fold; mobile truncates earlier (~140), so tune per surface
    body_min_chars: int = 900        # long enough to hold attention
    body_max_chars: int = 1500       # short enough to finish in one sitting
    allow_body_links: bool = False   # outbound links leak dwell time; move to comment 1
    optimize_for: str = "saves"      # saves imply real dwell + intent to return
    sources: dict[str, str] = field(default_factory=lambda: {
        "dwell_time": "https://www.linkedin.com/blog/engineering/feed/understanding-feed-dwell-time",
        "links": "https://www.linkedin.com/top-content/marketing/linkedin-content-and-ads/do-links-lower-linkedin-post-reach/",
    })

PLAYBOOK = ReachPlaybook()

The generator then enforces the playbook on the draft before I ever see it, so a reach-suppressing mistake never reaches the approval step. The check returns structured violations rather than throwing, so the tool can show me exactly what to fix.

# voice/lint.py
import re
from voice.playbook import PLAYBOOK

LINK = re.compile(r"https?://", re.IGNORECASE)

def lint_draft(body: str) -> list[str]:
    """Apply the reach playbook to a draft. Empty list means it's clean."""
    issues: list[str] = []
    # LinkedIn's "see more" fold is character-based across the whole post, so
    # the visible hook is the first hook_max_chars characters, newlines included.
    hook = body[: PLAYBOOK.hook_max_chars + 1]
    if len(body) > PLAYBOOK.hook_max_chars and "\n" not in hook:
        # No paragraph break inside the visible window: the opening line runs
        # past the fold, so the hook gets cut mid-thought before "see more".
        issues.append(
            f"no line break within the first {PLAYBOOK.hook_max_chars} chars; "
            "the hook runs past the 'see more' fold"
        )
    if not (PLAYBOOK.body_min_chars <= len(body) <= PLAYBOOK.body_max_chars):
        issues.append(
            f"body is {len(body)} chars; aim for "
            f"{PLAYBOOK.body_min_chars}-{PLAYBOOK.body_max_chars}"
        )
    if not PLAYBOOK.allow_body_links and LINK.search(body):
        issues.append("link in body suppresses reach; move it to the first comment")
    return issues

A playbook that only exists as my memory degrades the moment I forget a rule under deadline. A playbook that’s a file the tool enforces does not have that failure mode, and it makes every rule challengeable by anyone who reads the config.

Rendering slides into a PDF carousel

The carousel feature comes down to one constraint: LinkedIn’s document posts accept a PDF and render each page as one swipeable slide. PDF is the format that survives the round trip because its layout is locked, so what you design is what shows up. The pipeline is therefore: design each slide as HTML, render each slide to an image, and stitch those images into a multi-page PDF where one page equals one slide.

The render clips a fixed rectangle, so each slide’s HTML has to be sized to exactly that rectangle with zeroed body margins; otherwise the clip crops content or leaves blank gutters. A minimal slide looks like this, with width/height matching the SLIDE_W/SLIDE_H constants the renderer clips to.

<!-- slide_1.html -->
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <style>
      html, body { margin: 0; padding: 0; }
      /* Must match SLIDE_W x SLIDE_H in the renderer, or the clip won't line up. */
      .slide {
        width: 1080px;
        height: 1350px;
        box-sizing: border-box;
        display: flex;
        align-items: center;
        justify-content: center;
        background: #0a0a0a;
        color: #fff;
        font: 600 64px/1.2 system-ui, sans-serif;
      }
    </style>
  </head>
  <body>
    <div class="slide">Reach is earned in dwell time</div>
  </body>
</html>

For the render step I drive a headless browser. Start with the dependencies.

python -m venv .venv && source .venv/bin/activate
pip install playwright img2pdf pypdf requests
playwright install chromium

There are two clean ways to get a PDF out of Chromium. The native one is Playwright’s page.pdf(), which prints the page using print CSS media and needs print_background=True plus -webkit-print-color-adjust: exact to keep backgrounds and brand colors from being stripped for “printing” (Playwright: page.pdf(); MDN: -webkit-print-color-adjust). The other, which is what ghostwriter does, is to screenshot each slide at a fixed pixel size and assemble the PNGs into a PDF. Screenshots use screen media, so what you see in the browser is exactly what lands in the file, with no print-stylesheet surprises. For pixel-perfect, image-like slides, that predictability is worth more than page.pdf()'s pagination smarts.

# render_carousel.py
import tempfile
from pathlib import Path

import img2pdf
from playwright.sync_api import sync_playwright

# LinkedIn renders each PDF page as one swipeable slide. A 4:5 portrait
# frame (1080x1350) is the most mobile-friendly, so we capture at that size.
SLIDE_W, SLIDE_H = 1080, 1350
SCALE = 2  # device_scale_factor: render at 2x for retina-crisp text

def render_slides_to_pdf(slide_html: list[Path], out_pdf: Path) -> Path:
    """Screenshot each HTML slide, then stitch the images into one PDF."""
    png_paths: list[str] = []
    # Write intermediate PNGs to a scratch dir so we never clobber files
    # next to the input HTML.
    tmp = Path(tempfile.mkdtemp(prefix="carousel_"))
    with sync_playwright() as p:
        browser = p.chromium.launch()
        # SCALE doubles the pixel density, so a SLIDE_W x SLIDE_H CSS clip
        # produces a (SLIDE_W*SCALE) x (SLIDE_H*SCALE) PNG.
        page = browser.new_page(
            viewport={"width": SLIDE_W, "height": SLIDE_H},
            device_scale_factor=SCALE,
        )
        for i, slide in enumerate(slide_html):
            # wait_until="load" is reliable; networkidle is discouraged and flaky.
            page.goto(slide.resolve().as_uri(), wait_until="load")
            page.evaluate("async () => { await document.fonts.ready; }")  # let web fonts settle
            png = tmp / f"slide_{i}.png"
            # Clip to the exact frame so every slide is identically sized.
            page.screenshot(
                path=str(png),
                clip={"x": 0, "y": 0, "width": SLIDE_W, "height": SLIDE_H},
            )
            png_paths.append(str(png))
        browser.close()

    # One image becomes exactly one PDF page, which is exactly one slide.
    out_pdf.write_bytes(img2pdf.convert(png_paths))
    return out_pdf

Uploading the PDF through LinkedIn’s documents API

A carousel is a document post, not an image or text post, so it takes a different upload path. LinkedIn’s documents API is a three-step handshake: register the upload to get a one-time URL and a document URN, PUT the raw bytes to that URL, then attach the returned URN to a post (LinkedIn: Documents API). The version header is mandatory and is a bare YYYYMM string. One gotcha that isn’t obvious from the scope name: the /rest/documents and /rest/posts endpoints live under LinkedIn’s Community Management API product, which has to be explicitly enabled on your app. A plain Sign-In/Share app holding only the w_member_social scope returns 403 at initializeUpload until that product is approved.

Two inputs come from your LinkedIn app, not from this code: token is a member access token with the w_member_social scope (read it from an env var, e.g. os.environ["LINKEDIN_TOKEN"]), and author_urn is the posting member’s person URN in the form urn:li:person:{id}, where {id} is the sub claim returned by the /userinfo (OpenID) endpoint. That /userinfo call needs the OpenID Connect scopes (openid profile) on the token, a separate product enablement from w_member_social, so request all three when you set the app up.

# publish_document.py
import requests

API = "https://api.linkedin.com/rest"

def _headers(token: str) -> dict:
    return {
        "Authorization": f"Bearer {token}",
        "LinkedIn-Version": "202602",        # mandatory; bare YYYYMM, not a date
        "X-Restli-Protocol-Version": "2.0.0",
    }

def publish_carousel(token: str, author_urn: str, pdf: bytes, title: str) -> str:
    h = _headers(token)

    # 1) Register the upload. LinkedIn hands back a one-time URL and the URN
    #    that will represent this document everywhere it's reused.
    r = requests.post(
        f"{API}/documents?action=initializeUpload",
        headers=h,
        json={"initializeUploadRequest": {"owner": author_urn}},
        timeout=30,
    )
    r.raise_for_status()  # surface the API error body, not a later KeyError
    init = r.json()["value"]
    upload_url, doc_urn = init["uploadUrl"], init["document"]

    # 2) PUT the raw PDF bytes to that URL. No JSON wrapper, just the file.
    put = requests.put(
        upload_url, headers={"Authorization": h["Authorization"]}, data=pdf, timeout=60
    )
    put.raise_for_status()

    # 3) Attach the document URN to a post. This is what makes the feed
    #    render it as a swipeable carousel instead of a file attachment.
    post = requests.post(f"{API}/posts", headers=h, json={
        "author": author_urn,
        "commentary": "",
        "visibility": "PUBLIC",
        "distribution": {"feedDistribution": "MAIN_FEED"},
        "content": {"media": {"id": doc_urn, "title": title}},
        "lifecycleState": "PUBLISHED",
    }, timeout=30)
    post.raise_for_status()
    return doc_urn

The document URN is reusable; once a PDF is uploaded, the same URN can back more than one post without re-uploading the bytes. The carousel is just that document wearing a swipe gesture.

Verify the PDF before you publish

The upload is the expensive, hard-to-undo step, so the check belongs before it, not after. The two things that actually break a carousel are silent: a render that dropped or duplicated a slide gives the wrong page count, and a slide that came out the wrong aspect ratio gets letterboxed in the feed. Both are cheap to assert from the finished PDF, and both became a gate the publisher runs before it ever calls the API.

# verify_carousel.py
from pathlib import Path
from pypdf import PdfReader

from render_carousel import SLIDE_W, SLIDE_H, SCALE

# Derive the expected page size from the render constants so the two can't drift.
# device_scale_factor=SCALE makes the screenshot (SLIDE_W*SCALE) x (SLIDE_H*SCALE)
# pixels. Chromium screenshots don't write a pHYs DPI chunk, so img2pdf falls
# back to its 96-DPI default; PDF points are 1/72 inch, so each pixel becomes
# 72/96 of a point.
PT_PER_PX = 72 / 96
EXPECTED_W = SLIDE_W * SCALE * PT_PER_PX   # 1080*2*0.75 = 1620.0 pt
EXPECTED_H = SLIDE_H * SCALE * PT_PER_PX   # 1350*2*0.75 = 2025.0 pt
TOLERANCE = 2.0                            # absorb rounding from the export
# Caveat: this size gate assumes no pHYs DPI chunk in the PNGs. A Chromium or
# toolchain build that embeds DPI would change img2pdf's points-per-pixel and
# shift the page size, tripping the gate; re-derive PT_PER_PX if that happens.

def verify_carousel(pdf: Path, expected_slides: int) -> None:
    """Fail loudly before upload if the PDF is the wrong shape."""
    pages = PdfReader(str(pdf)).pages
    assert len(pages) == expected_slides, (
        f"expected {expected_slides} slides, PDF has {len(pages)}"
    )
    for i, page in enumerate(pages):
        w, h = float(page.mediabox.width), float(page.mediabox.height)
        assert abs(w - EXPECTED_W) < TOLERANCE, f"slide {i} width {w} != {EXPECTED_W}"
        assert abs(h - EXPECTED_H) < TOLERANCE, f"slide {i} height {h} != {EXPECTED_H}"
        assert h > w, f"slide {i} is not portrait ({w}x{h})"  # 4:5 must be taller than wide

Wiring the pieces together, the publish path is render, verify, then upload, and the verify step is the one allowed to stop it.

import os
from pathlib import Path

import requests

from render_carousel import render_slides_to_pdf
from verify_carousel import verify_carousel
from publish_document import publish_carousel

# Two inputs come from your LinkedIn app, not from this code (see the section above).
token = os.environ["LINKEDIN_TOKEN"]   # member token with w_member_social + openid profile
sub = requests.get(
    "https://api.linkedin.com/v2/userinfo",
    headers={"Authorization": f"Bearer {token}"},
    timeout=30,
).json()["sub"]
author_urn = f"urn:li:person:{sub}"

slides = [Path(f"slide_{i}.html") for i in range(1, 6)]
pdf = render_slides_to_pdf(slides, Path("carousel.pdf"))
verify_carousel(pdf, expected_slides=len(slides))   # raises before any network call
publish_carousel(token, author_urn, pdf.read_bytes(), title="Reach playbook")

A dropped slide or a botched aspect ratio now fails on my machine in milliseconds, instead of going live as a malformed post I have to delete and re-run the golden hour on.

Next

The immediate follow-up is automating “link in the first comment,” so a reach-suppressing link can never accidentally land in the body, which closes the last gap between the playbook and the published behavior. I also want to template the slide HTML so a carousel can be generated from an outline rather than hand-authored per post.

Sources

Changelog

  • [ghostwriter] feat(ghostwriter): reach optimization, flow diagrams & carousels (v0.3.0) (1cf7e86)