In creating a Data  Science solution, think of the components that define the solution space. You must consider the data types with which you will be working, the classes of analytics you'll use to generate your solution, how the models embodied will operate and evolve, and the deployment models that will govern how the modles will be run.

Data Types

  • Structured | Structured data exists when information is clearly broken out into fields that have an explicit meaning and are highly categorical, ordinal or numeric. Thinks SQL databases!
  • Unstrcutured | Unstructured data, such as natural language text, has less clearly delineated meaning. Still images, video and audio often fall under the category of unstructured data. Data in this form requires preprocessing to identify and extract relevant ‘features.’
  • Datat Speed | This the added dimension to the above with describes the speed of creation and consumption. From drips and batch files to streaming and fire hose. 

Analytic Classes

  • Transforming | Aggregation, Additive, and Changing
  • Learning | Regression, Clustering, Classification, and Recomendation
  • Predictive | Simulation and Oprimization


  • Learning | Unsupervised and Supervised. Supervised learning takes place when a model is trained using a labeled data set that has a known class or category associated with each data element. The model relates the features found in training instances with the labels so that predictions can be made for unlabeled instances. Unsupervised learning models have no a-priori knowledge about the classes into which data can be placed. They use the features in the dataset to form groupings based on feature similarity.
  • Training | Offline and Online. A useful distinction of learning models is between those that are trained in a single pass, which are known as offline models, and those that are trained incrementally over time, known as online models. Many learning approaches have online or offline variants. The decision to use one or another is based on the analytic goals and execution models chosen.

Deployment Options

  • Scheduling | Batch and Streaming.  Batch models imply discreet sets of processing and delivery times. It also implies processing times greater than minutes. Streaming on the other hand implies real time delivery of results and constant analytics. The choice between batch and streaming execution models often hinges on analytic latency and timeliness requirements. Latency refers to the amount of time required to analyze a piece of data once it arrives at the system, while timeliness refers to the average age of an answer or result generated by the analytic system.
  • Sequencing | Serial and Parallel. This is an added dimension and is a function of both data size and delivery requirements. How much data are required to train and how often they change, and how fast do you want the results. 


The dimensiosn that will slow the process down.

  • Speed | How fast do you want the results? (e.g., real time, hourly, daily.)
  • Analytic Complexity | How complex are the required algorithms?
  • Data Complexity | How many dimensions are required? What are the data types and how easily can they be prepared for model consumption?
  • Data Size |  How much data do we need?
  • Accuracy & Precision | How precise an answer do you need and at what levels of confidence?

information (at)
biteconomics (dot) com

© Copyright 2018-2020, Bit Economics LLC - All Rights Reserved