HL7 Data Types Guide
HL7 data types are the grammar of a v2 message. The segment tells you the broad subject, the field tells you the named fact, and the data type tells you how that fact is shaped: one value, repeated values, components, subcomponents, coded identifiers, display text, dates, names, addresses, identifiers, locations, payloads, or some mixture of all of them.
The searchable list on the left is the full index. This guide is the working map: how to look at a data type and decide how to parse it, validate it, map it, and preserve the meaning that is easy to lose when a field is treated as one flat string.
The data type tells you shape, not the whole meaning
A data type is a structural contract. It tells you whether a field is a simple string like ST, a number like NM, a coded value like CWE, an identifier like CX, or a person name like XPN. It does not, by itself, tell you the business rule. PID-3 Patient Identifier List and PV1-19 Visit Number both commonly use identifier-style data, but they do not identify the same thing and should not be mapped into the same bucket.
That distinction matters when you build transformations. A receiver may care about a CX assigning authority, a CWE coding system, or a DTM precision even if the visible value looks readable to a human. If you only map the first component, you may accidentally remove the part that made the value trustworthy.
The HL7 v2 control chapter describes data type as the restriction on a field's contents, and the current HL7 terminology table for data types lists common codes such as CWE, CX, DTM, ED, HD, XCN, XPN, and XTN. See HL7 v2.4 Chapter 2 and HL7 Terminology table 0440 for the standards-side view.
Delimiters are part of the contract
HL7 v2 messages usually use | for fields, ^ for components, ~ for repetitions, & for subcomponents, and \ for escapes. Those delimiter characters are declared in MSH-2 Encoding Characters, then used throughout the message.
The data type tells you what each position means. In 123456^^^CITYHOSP^MR, the first component is the identifier, the fourth component is an assigning authority, and the fifth component is an identifier type. In 718-7^Hemoglobin^LN, the third component says the code comes from LOINC. The same ^ character is just a separator; the data type is what gives each piece a job.
Repetition is separate from components. A repeated PID-3 value such as 123456^^^CITYHOSP^MR~A98765^^^NATIONAL^NI is two identifiers, not one identifier with extra components bolted on the end. This is one of the quiet places where naive string splitting causes real data loss.
Common data type families
| Family | Examples | What to watch |
|---|---|---|
| Text and numbers | ST, TX, FT, NM, SI, SN | Do not treat formatted text as plain text unless you have deliberately chosen to lose formatting. Numeric-looking identifiers are not always numbers. |
| Dates and times | DT, TM, DTM, TS, DR | Precision and timezone matter. A birth date, collection time, result time, and discharge time have different business consequences. |
| Coded values | ID, IS, CE, CWE, CNE, CF | The code is not enough unless the receiver also knows the table or coding system. Local extensions must be explicit. |
| Identifiers | CX, EI, EIP, HD | Preserve assigning authority, assigning facility, identifier type, and universal ID details. These are what keep identifiers from colliding. |
| People, organizations, addresses, phones | XPN, XCN, XON, XAD, XTN | Do not flatten the whole value into a display label if the receiver needs family name, given name, identifier, address type, or telecom use. |
| Location and workflow | PL, PT, MSG, VID | These often drive routing, parser selection, facility separation, and environment handling. |
| Payloads and references | ED, RP | Keep media type, subtype, encoding, source application, and external pointer rules together with the payload. |
Coded values need code system discipline
Coded values are where HL7 interfaces become dialects. The same visible code can mean different things in different tables, and the same concept can be represented by different code systems. That is why the coded data types matter.
ID is usually a tight value from an HL7 table. IS is usually a coded value for a user-defined table. Older CE values carry an identifier, text, coding system, alternate identifier, alternate text, and alternate coding system. CWE and CNE are more explicit successors for many coded fields. CWE is for coded values where site extensions or exceptions may exist; CNE is for values where the sender is expected to use a code from a non-extendable value set. The HL7 v2+ CWE notes make that contrast directly, and they also point out that equivalent alternate codes inside one CWE are different from repeating the field for distinct meanings.
Table 0396 Coding System is a small table with a large effect. In values such as 718-7^Hemoglobin^LN, J45.909^Asthma, unspecified^I10, or mg/dL^milligram per deciliter^UCUM, the coding-system component is what tells the receiver how to interpret the code. If a mapping drops the third component, it may turn a precise coded value into a local string.
For details, start with CWE, CNE, CE, ID, IS, and the HL7 v2+ CWE data type notes.
Identifiers are not safe without authority
Patient numbers, visit numbers, order numbers, accession numbers, placer IDs, filler IDs, provider IDs, and organization IDs are only safe when you know who assigned them. The value 123456 is not globally meaningful. It becomes usable when the message also carries authority, facility, namespace, identifier type, or universal ID information.
CX is the workhorse for patient and visit identifiers. EI and EIP appear around order and entity identifiers. HD is the hierarchic designator used inside many richer data types to identify assigning authorities, facilities, applications, and organizations. The HL7 v2+ HD notes describe HD as a way to identify assigning authorities and assigning facilities, with local namespace IDs or universal IDs.
A practical identifier mapping keeps these pieces together:
- The literal identifier. The visible ID number or entity identifier.
- The assigning authority. The system, organization, or namespace that created the identifier.
- The identifier type. Medical record number, national identifier, account number, visit number, order number, and so on.
- The assigning facility. Important in multi-site systems where one authority may still need facility context.
The common trap is to map only CX.1 and discard the rest. That works until two sites both send patient 123456, or one system changes from local IDs to enterprise IDs and the receiver cannot tell which world the value belongs to.
Names, addresses, and telecom values are structured data
XPN, XCN, XAD, XON, and XTN are easy to flatten because humans like display strings. Real interfaces often need more than display.
A person name may need family name, given name, suffix, prefix, degree, name type, representation code, and assembly order. A provider field may include both a person identifier and name detail. An address may need address type, county, census tract, validity range, and country. A telecom value may need equipment type, use code, country code, area code, local number, extension, email address, or URL depending on the version and profile.
Flattening can still be correct when the receiver only displays the value. It is a poor default when the receiver searches, deduplicates, validates, formats letters, routes messages, or matches provider and patient master data.
Integration Soup workflows can still help when your source data starts as ordinary text. For recognized common fields, the designer can suggest data-table-backed values that fill the right parts of patient names, addresses, and phone numbers instead of making you hand-build every XPN, XAD, or XTN component. Built-in tables such as Person, Name, Address, and Phone can supply values like ${DataTable:Person.GivenName}, ${DataTable:Person.FamilyName}, ${DataTable:Address.Address1}, and ${DataTable:Phone.Phone}.
That is a workflow convenience, not permission to forget the structure. Use the helper to turn plain source values into the components the receiver expects, then keep those components visible in your mapping and validation rules.
Dates and times carry precision
HL7 date/time values often express precision by how much of the value is present. 202607 is not the same as 20260701000000. The first says July 2026. The second says midnight on July 1, 2026. Padding missing precision can invent facts the sender did not send.
DT is for dates, TM is for times, DTM is for date/time, TS is the older timestamp shape, and DR is a date/time range. The old TS precision component is retained for backward compatibility in older feeds, but later v2 usage relies more on the precision expressed by the date/time value itself. HL7 Terminology table 0529 is useful when you are interpreting old TS precision values.
Use the field context as well as the data type. OBR-7 Observation Date/Time, OBX-14 Date/Time of the Observation, PV1-44 Admit Date/Time, and MSH-7 Date/Time of Message are all date/time values, but they answer different operational questions.
When timezone is present, preserve it. When timezone is absent, do not silently pretend the value is UTC unless the interface agreement says so. That one assumption can move collections, admissions, medications, and audit trails across a day boundary.
OBX-2 and OBX-5 are the data type stress test
OBX-2 Value Type tells the receiver how to interpret OBX-5 Observation Value. This is why OBX parsing cannot be hard-coded as "the result is a string". A lab result might be NM, a coded organism might be CWE, a narrative result might be TX or FT, a structured numeric might be SN, and a document payload might be ED.
The current HL7 Terminology entry for table 0440 describes the data type code system as specifying the format of the observation value in OBX and related segments. In practical terms, that means OBX-2 is a parser instruction. If OBX-2 says NM, validate numeric content. If it says CWE, preserve code, text, and coding system. If it says ED, preserve source application, type of data, subtype, encoding, and the data itself.
This is one of the best quick checks for a receiving interface: send representative OBX values for NM, CWE, TX, FT, and ED if the profile supports them. You will learn quickly whether the receiver truly respects OBX-2 or just happens to parse the one example it was built from.
Best practices that prevent quiet data loss
- Parse by the declared delimiters and data type. Do not split a field once and hope the first component is all that matters.
- Preserve repetitions as repetitions. Multiple patient identifiers, phone numbers, allergies, and coded observations are separate facts.
- Keep coding systems with coded values. A code without its coding system can be ambiguous or unusable.
- Keep assigning authority with identifiers. The authority is often what prevents duplicate ID collisions.
- Do not flatten rich names, addresses, telecom values, and locations by default. Display text is useful, but it is not a substitute for structured components.
- Respect date/time precision and timezone. Do not invent missing day, time, or offset values just to satisfy a database column.
- Validate required components, not just required fields. A field can be present while the component you actually need is missing.
- Profile local choices. Data types allow structure; your interface agreement must still say which components, repeats, code systems, and local values are expected.
- Test partial and messy values. Empty components, repeated identifiers, local codes, alternate codings, old TS values, and unexpected OBX-2 values are where production feeds reveal the truth.
A practical mapping workflow
When reviewing an HL7 interface, I usually work through the data types in this order:
- Find fields that repeat, especially identifiers, contact details, observations, and diagnoses.
- Find coded fields and list their coding systems or table numbers.
- Find identifier fields and preserve assigning authority, facility, and identifier type.
- Find date/time fields and confirm precision, timezone, and local-time assumptions.
- Find variable fields such as OBX-5 and confirm the parser follows the controlling field.
- Find large text or payload fields and confirm the receiver keeps formatting, encoding, and attachment metadata.
In HL7 Soup Web, open a real message and click through fields like PID-3, PV1-19, OBR-4, OBX-2, and OBX-5 to see where components, repeats, tables, and coding systems sit. In Integration Soup, keep mappings component-aware: map the identifier, authority, coding system, text, repeats, and error behavior deliberately rather than treating the whole field as a blob. The transformer tutorial is the natural next stop when you need to turn that understanding into a production mapping.
Useful data types to learn first
What to do next
Pick one message from a real interface and mark every field where a component, repeat, coding system, assigning authority, timezone, or OBX-2 value changes the meaning. Those are the fields that deserve explicit mapping and validation. The rest of the message may still matter, but these are the places where a visually valid HL7 string can quietly become wrong.