Quick summary: The single most common redaction mistake — drawing a black rectangle over sensitive text — leaves the underlying text fully intact and recoverable by anyone who selects, copies, or scrapes the document. Real redaction removes the text from the PDF's content stream entirely. This guide shows the difference, walks through doing it correctly, and gives you a 20-second verification step that catches almost every failed redaction before it leaves your hands.
The mistake that leaks documents every week
Someone needs to share a PDF but hide a few things: a bank account number, a patient identifier, a witness's name, a clause in a contract. They open a PDF editor, draw a solid black rectangle over each piece of text, save the file, and send it. The document looks redacted on screen and on paper.
It isn't.
The black rectangle is just a visual layer painted on top. The original text still exists in the PDF underneath, in a layer called the content stream, exactly where it has always been. Anyone who receives the file can:
- Click and drag to select the "hidden" text through the box.
- Copy it and paste it into any text editor — the black box is ignored.
- Search the document and the hidden text matches.
- Open the PDF in a developer tool or run a one-line script and dump every word on the page.
This is not a theoretical weakness. It has happened in court filings, government documents released under FOIA, and corporate disclosures. Each time, the organization believed the text was gone because it looked gone. Visual appearance and data removal are different things, and PDFs exploit that confusion mercilessly.
Why this happens: how a PDF stores text
A PDF page is a list of drawing instructions: "draw the glyph 'H' at coordinates (72, 700)," "draw a filled rectangle from (60, 690) to (200, 710)," and so on. The black rectangle is simply drawn after the text, so it covers it visually. But both instructions are still in the file. Removing the visual cover does nothing to the text instruction, and removing the text instruction is the only thing that counts as redaction.
Genuine redaction means editing that instruction list: deleting the text-drawing instructions for the redacted words (and any references to them), then re-flowing or padding so the page still looks the way you want.
The correct way: remove, don't cover
A proper redaction tool does three things the black-box method does not:
- Finds the text in the content stream, not just on the rendered page.
- Removes the text-drawing instructions for the matched regions, optionally replacing them with a drawn rectangle so the page still looks redacted.
- Cleans up related artifacts — see the gotchas below, because text hides in more places than the visible page.
This is what the Find & Redact tool does: it searches the actual text layer, shows you every match for confirmation, and on confirmation removes the underlying text rather than masking it.
Step-by-step: redact a document properly
1. Make a copy first
Always redact a copy, never the only copy of the original. Once text is truly removed, it is gone — there is no "undo" after the file is saved and closed.
2. Search for every variant of what you're hiding
People redact "Acme Corp" but forget "ACME", "Acme Corporation", the ticker symbol, and the email domain. Redaction that misses one variant has failed. Use the tool's search to find every form, including partial matches. If your tool supports regular expressions, use them — a pattern like \b[A-Z]{2}\d{6}\b catches ID formats you might miss by eye.
3. Review matches before applying
A good tool shows every hit in context before removing anything. Confirm each one. False positives (redacting a word that appears in an innocent context) are just as much a failure as misses.
4. Apply true redaction
Confirm removal. The tool rewrites the content stream without the matched text.
5. Verify — the non-negotiable step
See the next section. Do this every single time.
The 20-second verification that catches almost every failure
Before you send the file anywhere, do all three of these:
- Select All (Ctrl/Cmd+A), Copy (Ctrl/Cmd+C), Paste into a plain text editor. Read the result. Any redacted text that appears here means your redaction failed. This single test catches the black-box mistake instantly.
- Search (Ctrl/Cmd+F) for each redacted term. Type the exact text you tried to hide. Zero matches is the only acceptable result — including common abbreviations and partial forms.
- If the document was OCRed or is a scanned image with a text layer, repeat the search — OCR text layers are a notorious hiding spot and are often missed by visual redaction.
If all three pass, the visible-text layer is clean. Now check the other places data hides.
Where text hides besides the visible page
Even after the body text is removed, sensitive data frequently survives in:
- Metadata — the document's Author, Title, Subject, and Keywords fields often contain names, paths, or emails. Strip them with Remove Metadata or a full Sanitize.
- Comments and annotations — a reviewer's comment may quote the redacted text. Remove annotations before finalizing.
- Hidden layers (Optional Content Groups) — a PDF can contain layers that are invisible by default but contain text. If your editor exposes layers, check them.
- Embedded files — a PDF can have other files attached, including earlier un-redacted drafts.
- Document JavaScript or actions — rare, but actions can reference field values.
- Images of text — if the "text" is actually a picture of text (a scan), redacting the text layer does nothing; you must redact the image pixels. This is the one case where a drawn box is part of the solution, but only over the image, and only if there's no machine-readable text layer underneath.
For high-stakes documents, a full Sanitize pass removes metadata, scripts, hidden content, and attachments in one step. Treat sanitize as the final gate before a redacted file leaves your control.
Redaction for regulated work (GDPR, HIPAA, PII, legal privilege)
If you're redacting to meet a legal or regulatory obligation, the standard is higher and the verification step is not optional:
- Assume the recipient is adversarial. Lawyers, journalists, and data-subject-access-request respondents regularly process documents for people who would prefer the hidden text to leak. Verify accordingly.
- Document your redaction process. In some contexts you may need to show how you redacted, not just that the output looks clean.
- Don't rely on a single tool silently. Run the verification steps above on every file. A tool that worked yesterday can ship a bug tomorrow.
- Consider the output format. If you must guarantee no hidden text survives, exporting the redacted pages to images and reassembling them into a new PDF is a brute-force but extremely safe final step — there is no text layer to recover because there is no text layer, period. (The PDF to JPG → Image to PDF path does this.)
Frequently asked questions
If I draw a black box and then "flatten" the PDF, is the text gone? No. Flattening merges annotations and form fields into the page content, but the underlying text was already in the page content — flattening doesn't remove it. The text is still there.
What about printing to PDF as an image? That works as a nuclear option — it destroys the text layer entirely. But you lose all real text (searchability, accessibility, copyable headings) and blow up the file size. It's a fallback, not a primary method.
Can redaction be undone? If done correctly (text removed from the content stream), no — that's the point. If done incorrectly (visual cover only), "undoing" it is trivial, which is the problem.
Does CommandPDF store my file during redaction? No. The entire redaction happens in your browser. For legal and medical documents, that is the safest possible model — the sensitive text never reaches a server to be leaked, retained, or compelled.
Conclusion
Redaction is a data-removal problem dressed up as a visual problem. Treat it as data removal: remove the text from the content stream, strip the metadata, kill the annotations, and verify with Select-All-Copy-Paste and a search before the file leaves your hands. The black box is a lie the document tells your eyes. Trust the text layer, not the pixels.
Redact a PDF safely — text removed locally in your browser →
Related reading:
