The increasing adoption of health information technologies in the United States accelerates their potential to facilitate beneficial studies that combine large, complex data sets from multiple sources. The process of de-identification, by which identifiers are removed from the health information, mitigates privacy risks to individuals and thereby supports the secondary use of data for comparative effectiveness studies, policy assessment, life sciences research, and other endeavors. There are two methods to achieve de-identification: Expert Determination and Safe Harbor.
Satisfying either method would demonstrate that a covered entity has met the standard in §164.514(a). De-identified health information created following these methods is no longer protected by the Privacy Rule because it does not fall within the definition of PHI. However, de-identification leads to information loss which may limit the usefulness of the resulting health information in certain circumstances. As described in the forthcoming sections, covered entities may wish to select de-identification strategies that minimize such loss.
Expert Determination
A covered entity can determine that health information is not individually identifiable health information only if:
• The covered entity has appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable.
• The covered entity determines that the risk is very small that the information could be used alone or in combination with other reasonably available information by an anticipated recipient to identify an individual who is a subject of the information.
• The covered entity documents the methods and results of the analysis that justify such determination.
Safe Harbor
The following identifiers of the individual or of relatives, employers, or household members of the individual are removed:
• Names.
• All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:
• The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people.
• The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000.
• All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.
• Telephone numbers.
• Vehicle identifiers and serial numbers, including license plate numbers.
• Fax numbers.
• Device identifiers and serial numbers.
• E-mail addresses.
• Web Universal Resource Locators (URLs).
• Social Security numbers.
• Internet Protocol (IP) addresses.
• Medical record numbers.
• Biometric identifiers, including finger and voice prints.
• Health plan beneficiary numbers.
• Full-face photographs and any comparable images.
• Account numbers.
• Any other unique identifying number, characteristic, or code, except as permitted by 164.514 (c).
• Certificate/license numbers.
The covered entity must not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.
Re-identification
The implementation specifications further provide direction with respect to re-identification, specifically the assignment of a unique code to the set of de-identified health information to permit re-identification by the covered entity.
For more information, see:
https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html