The difference between primary data analysis and secondary data analysis outlined by National Institutes of Health (NIH) is that primary data analysis is limited to the analysis of data by members of the research team that collected the data to answer the original hypotheses proposed in the research. All other analyses of data collected for specific research studies or analyses of data collected for other purposes (including registry data) are considered secondary analyses of existing data.
Determining whether proposed research involves secondary analysis of existing data can be challenging. The IRB will only consider proposed secondary data analysis if the investigator had no involvement in the prior data collection or if the data were originally collected for a purpose other than contributing to generalizable knowledge (such as program evaluation or institutional research).
The definition of existing data may include both data provided to the investigator from any source and data already in the possession of the investigator.
Public use data sets (such as portions of U.S. Census data, data from the National Center for Educational Statistics, National Center for Health Statistics, etc.) are datasets prepared with the intent of making them available for the public. The data available to the public are not individually identifiable and therefore their analysis would not involve human subjects.
Human subject means a living individual about whom an investigator (whether professional or student) conducting research:
(i) Obtains information or biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens; or
(ii) Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens. If the data is "identifiable" or "indirect" identifiers. Direct identifier examples include participants’ names, Social Security Numbers, ID numbers, date of birth, etc.). Indirect identifiers include a coding system in which codes (such as letters, numbers, and symbols) replace direct identifiers and a key to decipher the code exists, which enables linkage of the identifying information to the private information or specimens.
Proposed analysis of existing data which does not meet the definition of human subject (or participant) does not need review by the IRB.
Proposed analysis of existing data that meets the definition of human subject and contains both direct and indirect identifiers and does not require consent is considered exempt from IRB review if at least one of the following criteria is met:
- The identifiable private information or identifiable biospecimens are publically available;
- Information is recorded by the investigator in such a manner that the identity of the human subjects cannot readily be ascertained directly or through identifiers linked to the subjects, the investigator does not contact the subjects, and the investigator will not re-identify subjects;
- The research involves only information collection and analysis involving the investigator’s use of identifiable health information when that use is regulated under HIPAA – 45 CFR Parts 160 and 164, for the purposes of “health care operations” or “research” as those terms are defined at 45 CFR 164.501or for “public health activities and purposes” as described under 45 CFR 164.512(b);
- The research is conducted by, or on behalf of, a Federal department or agency using government-generated or government-collected information obtained for non-research activities, if the research generates identifiable private information that is or will be maintained on information technology that is subject to and in compliance with section 208(b) of the E-Government Act of 2002, 44 U.S.C. 3501