Data and Evidence in Science: Collection, Analysis, and Interpretation

Scientific knowledge doesn't emerge from intuition alone — it's built on data, tested against evidence, and shaped by the methods used to collect and interpret both. This page covers the core practices of scientific data collection, the analytical frameworks researchers apply, and the interpretive principles that separate a defensible conclusion from a convenient one. The distinctions matter because flawed data handling is one of the primary drivers of the replication crisis in science, a problem that has touched fields from social psychology to cancer biology.


Definition and scope

Data in scientific research refers to recorded observations or measurements — quantitative values like temperature readings and cell counts, or qualitative descriptions like interview transcripts and field notes. Evidence is the interpretation layer: data becomes evidence when it's evaluated in relation to a specific hypothesis or theoretical claim.

The scope of data collection spans every scientific discipline. A clinical trial tracking blood pressure across 800 participants (NIH National Library of Medicine, ClinicalTrials.gov) operates on fundamentally the same logic as an astronomer cataloguing the spectral signatures of exoplanet atmospheres — observations are systematically gathered, organized, and subjected to scrutiny. What varies is the instrument, the scale, and the acceptable margin of error.

Two broad categories define most scientific data:

The distinction between the two is explored in depth at Quantitative vs. Qualitative Research, but the short version is that neither type is inherently more rigorous. A poorly controlled quantitative study is far less informative than a carefully documented qualitative one.


How it works

Data collection begins with research design — the architectural decision about what kind of data will actually answer the research question. That decision determines everything downstream: the instruments used, the sample selected, the controls applied, and the statistical tests chosen.

The practical pipeline looks like this:

  1. Operationalization: Abstract concepts are translated into measurable variables. "Stress" becomes salivary cortisol concentration in nanomoles per liter. "Learning" becomes pre- and post-test score differentials.
  2. Instrument selection and validation: Tools — whether a mass spectrometer, a survey instrument, or a structured observation protocol — must demonstrate reliability (consistent results across repeated use) and validity (measuring what they claim to measure).
  3. Data collection: Observations are recorded under controlled or documented conditions. Laboratory research protocols establish the procedural standards that make data from one lab comparable to data from another.
  4. Data cleaning and management: Raw datasets almost always contain errors, missing values, or outliers that require documented handling. Research data management standards — including those from the National Science Foundation's data management plan requirements — govern how data is stored, labeled, and archived.
  5. Statistical analysis: Methods are selected based on data type and study design. Descriptive statistics summarize distributions; inferential statistics test whether observed patterns exceed what chance alone would produce. A p-value of 0.05 — the conventional threshold in many fields — means a 1-in-20 probability that the result occurred by chance under the null hypothesis, not that the finding is "true." (American Statistical Association Statement on P-Values, 2016)
  6. Interpretation: Results are placed in context — what the data supports, what it doesn't, and where alternative explanations remain plausible. This is where the peer review process provides its most critical function.

The full landscape of analytical approaches, including machine learning and simulation modeling, is covered at Statistical Analysis in Research and Computational and Data-Driven Research.


Common scenarios

Across the sciences, a few data scenarios recur with enough frequency to have developed their own methodological traditions.

Controlled experiments — randomized assignment to treatment and control conditions — remain the gold standard for establishing causal relationships. Clinical trials, as described at Clinical Trials Overview, apply this model with the highest degree of regulatory oversight, including IRB approval and pre-registration requirements.

Observational studies collect data without manipulating variables. Epidemiologists tracing disease incidence across populations, ecologists monitoring species distribution over decades, and economists analyzing wage data from administrative records all work in this mode. Causation is harder to establish, but the ecological and logistical realities of studying humans and ecosystems often make experimentation impossible or unethical.

Systematic reviews and meta-analyses aggregate data across multiple independent studies, applying statistical methods to synthesize findings at a scale no single study can achieve. The Cochrane Collaboration has produced over 8,000 systematic reviews in health research, each following a documented protocol designed to minimize selection bias.


Decision boundaries

Interpretation is where scientific judgment — and its vulnerabilities — are most visible. Three boundaries matter most:

Signal vs. noise: Statistical significance is not the same as practical significance. A drug that reduces symptom duration by 0.3 hours with p < 0.001 across a sample of 50,000 participants has cleared the statistical bar while offering negligible clinical value. Effect size measures like Cohen's d or odds ratios carry the interpretive weight that p-values alone cannot.

Correlation vs. causation: Observational data can establish association with high confidence. Causal inference requires ruling out confounders, reverse causation, and spurious correlation — a process that demands design-level thinking, not just post-hoc analysis.

Within-sample vs. generalizable: A finding derived from a sample of 240 undergraduate students at a single university (a documented limitation in psychology research) may not generalize to different populations, ages, or cultural contexts.

Navigating these boundaries is the defining skill of scientific reasoning — and understanding them is a prerequisite for evaluating any claim that invokes data as its foundation. The broader principles of how science builds knowledge from evidence are grounded in the scientific method, and the full scope of research approaches is mapped across the National Science Authority's reference index.


References