The Role of Statistics in Computer Science

The Role of Statistics in Computer Science
The Role of Statistics in Computer Science

Statistics play an intrinsic role in computer science and vice versa. Statistics is used for data mining, speech recognition, vision and image analysis, data compression, artificial intelligence, and network and traffic modeling. A statistical background is essential for understanding algorithms and statistical properties that form the backbone of computer science.

Roles of Statisticians

Statistician John Tukey (1915-2000) was key in developing ideas embraced by statisticians, such as exploratory techniques in order to better understand the data, which then leads to procedures such as hypothesis testing. Statisticians put much importance on the rigor of their analyses and incorporate theory into solving problems of uncertainty. These theories inform the methods to help establish scientific underpinnings to problems and their solutions.

Roles of Computer Scientists

Computer scientists tend to focus on data acquisition/cleaning, retrieval, mining, and reporting. They are often tasked with the development of algorithms for prediction and systems efficiency. Focus is also placed on machine learning (an aspect of artificial intelligence), particularly for the purposes of data mining (finding patterns and associations in data for a variety of purposes, such as marketing and finance).

Application of Statistics in Computer Science

There are a number of ways the roles of statisticians and computer scientists merge; consider the development of models and data mining. Typically, statistical approach to models tends to involve stochastic (random) models with prior knowledge of the data. The computer science approach, on the other hand, leans more to algorithmic models without prior knowledge of the data. Ultimately, these come together in attempts to solve problems.

Data mining processes for computer science have statistical counterparts. Consider the following:

Steps in Computer Science

Steps in Statistics

Data acquisition/enrichment

Experimental design for the collection of data/noise reduction

Data exploration

Discerning the distribution/variability

Analysis and Modeling

Group differences, dimension reduction; prediction; classification

Representation and Reporting

Visualization; communication

How else is statistics used in computer science? Simulations (used to gain a greater understanding of a variety of systems) are truly a marriage of computing capability and statistics—the use of statistics within programming improves understanding of the underlying system leading to more meaningful results. Statistics in software engineering leads to more conclusive determinations of quality and optimal performance.

Learn More About Our Statistics Program

The Conflicts

Given the intertwined workings of both disciplines, where are the conflicts? Common complaints computer scientists have had regarding statisticians include:

  • Lack of programming sophistication
  • The imposition of standard techniques over innovative techniques
  • Being overly immersed in theory over solving real-world problems

Statisticians tend to take issue with computer scientists over their:

  • Lack of statistical foundations for data collection and analysis
  • Insufficient consideration of the objectives
  • Disregard for the representative nature of data

Consider model development: A computer scientist may believe that a statistician is developing a theory for the sake of developing a theory, while the statistician may be concerned that the computer scientist has given no thought to model fit.

Joining Forces

When practitioners of these discipline combine forces (either as one individual or in collaboration with each other), the results deliver a combination of rigorous science and efficiency. Things that computer scientists could learn from statisticians could assist them in understanding what is already known in order to avoid reinventing the wheel (e.g., neural networks not being that fundamentally different than regression and classification). This would assist the computer scientist in moving beyond what is already known in order to develop innovative techniques.

Many statisticians also have a solid grasp on soft skills necessary to ensure that developed techniques help decision-makers best understand issues and how to apply solutions. Michigan Technological University’s Masters in Applied Statistics Online offers an in-depth exploration of the use of applied statistics in computer science.

Statisticians can benefit from learning the world of computer science—how to move beyond theory and use their sophisticated skills to tackle real-world problems. This is where an applied statistics degree comes in to help students gain computational strengths to move theory into solutions. Michigan Tech offers a robust online master’s degree in applied statistics that teaches these skills and how to integrate them into your organization.

Both fields are trying to solve the same problems. This is where the rubber of statistics meets the computer science road. When the forces of statistics and computer science are combined we all benefit.

Learn More About Our Program