Data Analysis


The technological discipline of data analysis emerged from the union of statistics, com- puter science, pattern recognition, artificial intelligence, and machine learning. Data analysis is the process of transforming data into information. Similarly, information evolves from data in the process of answering a question. Therefore, the question is the key part of the process. Computers have made it possible to aggregate several data sets into huge databases. Although problems arise—for instance, storage—when attempting to manage and make sense of large data sets, the time invested leads to the development of new software tools, with the expectation that the mined data will lead to better decision making.

In 1970, statisticians began to question the traditional paradigm of data analysis. The traditional process began by formulating a hypothesis, followed by collecting data, and ended by testing the hypothesis. To address their questioning of the tra- ditional data analysis process, statisticians began to use open-ended data exploration methods. John W. Tukey of Princeton University and AT&T Bell Labs established a new approach called exploratory data analysis. Tukey (1977) suggested investigating data as a detective investigates a crime scene with an open mind and few, if any, as- sumptions. Tukey saw data analysis as a mixture of science and art. The process of analyzing data includes the creative search for meaning as well as a systematic method for guiding the search. The goal of exploratory data analysis (EDA) is quite different from the traditional paradigm of hypothesis testing, also known as confir- matory data analysis. EDA seeks to find “patterns in data for hypothesis generation and refinement” (Behrens & Smith, 1996, p. 952).

Computers are problem-solving tools that help implement a systematic method of exploration. Certain valuable cognitive skills warrant special emphasis for PreK–12 students:

  • The ability to analyze a variety of problems and understand how to select and use productivity tools to find solutions
  • The ability to understand the theoretical background to use software programs to solve different types of problems encountered in both personal and professional activities

Problem-solving activities help students learn how to collect, interpret, and rep- resent data. In the long run, these computer-assisted activities help students deepen their understanding of using data analysis to answer questions, solve problems, and make decisions in business, politics, and research.

The National Council of Teachers of Mathematics (NCTM) advocates having students generate questions that require collecting and exploring data. The NCTM Data Analysis and Probability Standard (2000) for PreK–12 students has four goals:

  1. Formulate questions that can be addressed with data; and collect, organize, and display relevant data to answer
  2. Select and use appropriate statistical methods to analyze
  3. Develop and evaluate inferences and predictions that are based on
  4. Understand and apply basic concepts of

The NCTM recommends a strong emphasis on developing data analysis skills in all grades as well as progressively increasing the sophistication of the concepts and procedures. This ensures that by the end of high school students have a strong statistical background and knowledge of using computers as a problem-solving tool.