HL7 for documents and PDFs

HL7 document interfaces usually mean MDM_T02, TXA, and sometimes document payloads in OBX using the ED datatype. The biggest mistake is treating the PDF or RTF as the only important part. Without the document ID, patient, encounter, author, status, date, type, and order/accession context, the document cannot be safely filed.

A PDF inside HL7 is typically Base64 text. It can look like noise in a raw message, but HL7 Soup Web makes it easier to see the segment structure around that payload before you extract or write files.

MSH|^~\&|TRANSCRIPTION|CITYHOSP|EHR|CITYHOSP|20260715140500||MDM^T02^MDM_T02|DOC000501|P|2.5.1 EVN|T02|20260715140500 PID|1||123456^^^CITYHOSP^MR||Smith^Jane^Anne^^Ms^^L||19800314|F PV1|1|O|RAD^REPORT^1^CITYHOSP||||12345^Careful^Clara^^^^^^NPI TXA|1|RADRPT^Radiology Report^HL70270|AP^Authenticated^HL70271|20260715135500|67890^Reporter^Riley^^^^^^NPI|20260715140500|20260715140500|RAD4488^EHR|ACC778899|DOC7788^TRANSCRIPTION|DI|AU OBX|1|ED|PDF^Report PDF^LOCAL||^application^pdf^Base64^JVBERi0xLjQKJcTl8uXrCg==||||||F

This is synthetic sample data for learning and testing. Open it in HL7 Soup Web before mapping it so the segment groups, repeated fields, and coded values are visible.

Metadata First, Payload Second

TXA is the document header. It tells the receiver what kind of document this is, who authored or authenticated it, what date applies, what unique document number should be used, and what status is being reported. The OBX payload may contain the actual text, PDF, RTF, or encoded content, but TXA usually decides the filing behavior.

Do not overwrite a final document because a preliminary version has the same patient and type. Do not file an addendum as a second unrelated report. The document unique ID and status rules need to be agreed before the payload extraction is built.

  • Keep TXA-12 unique document number.
  • Preserve document type and status fields.
  • Keep order, accession, and encounter identifiers when reports relate to an order.

PDF, RTF, Text, And References

Some senders put plain text directly in OBX-5. Others send ED values with Base64 PDFs or RTF. Some send a pointer rather than the content itself. Your receiver should decide whether it stores the original HL7, writes the decoded file, stores metadata in a database, or forwards a document to another repository.

The ED datatype gives HL7 a way to carry encoded content, but it does not force the receiving application to implement document viewing or filing. Confirm the receiver's supported payload style before go-live: embedded Base64, external reference, MDM/TXA document notification, separate file transfer, or no payload at all. In Integration Soup, keep that decision visible as a branch in the workflow rather than burying it in a transformer.

A Practical Integration Soup Workflow

The PDF, binary, and RTF in HL7 tutorial is the natural Integration Soup companion for this article. It shows extracting a PDF from an incoming HL7 message, appending a PDF into a new OBX segment, and creating an HL7 file when a PDF arrives in a directory.

In Integration Soup, keep the decoded file path, document ID, patient ID, visit number, document status, and message control ID in logs. If the workflow writes files, use the file-processing best practices so downstream systems never read partial output.

The Test Pack I Would Ask For

Ask for preliminary, final, corrected, and addendum documents; PDF and plain-text payloads; a missing or duplicate document ID; a cancelled document if supported; and a payload large enough to prove your interface handles real file sizes.