What is Data Science?

Asked

Viewed 260 times

7

The Stack Overflow matrix has a site called Data Science SE. I, who am an active member of the tag R, I realize that there are few discussions on this subject elsewhere in the Brazilian version of Stack Overflow. Amazes me that this lack of subject mainly in the tag python, that some researches indicate as being the most popular language to work with this subject.

That said, I have the following questions to ask:

  • What is Data Science?

  • As it differs from Statistics?

  • Data Science, Data Mining, Artificial Intelligence, Big Data and Machine Learning are all synonymous with each other or are there differences between these terms?

  • In addition to statistics, mathematics and computation, what are the other skills a data scientist should have?

I created this topic inspired by the question What is Data Mining ?, from right here at Stack Overflow PT.

  • I did not understand what is the relation of Python and R with the question, since it seems merely conceptual

  • Your point is perfectly valid. I put it in these tags to draw more attention of a likely target audience to it.

  • https://www.youtube.com/watch?v=X3paOmcrTjQ I watched this video and got an overview quickly

1 answer

7


What is Data Science?

Data science combines the application of subjects such as computer science, software engineering, mathematics, statistics, programming, economics and business management. It is based on the collection, preparation, analysis, management, visualization and storage of large volumes of information, where in simple terms it can be understood as having strong connections to databases, including big data.

As it differs from Statistics?

Data science combines multidisciplinary fields and computation to interpret data for decision-making, while statistics refer to mathematical analysis that uses quantified models to represent a given data set.

Data science is more oriented to the field of big data, which seeks to provide perception information from large volumes of complex data. On the other hand, statistics provide the methodology to collect, analyze and draw conclusions from the data.

Data science uses tools, techniques and principles to filter and categorize large volumes of data into appropriate data sets or models. This is contrary to the statistic that is limited to tools such as frequency analysis, mean, median, variance analysis , correlation and regression, and so on, to name a few. Data science will investigate and inspect the data to deduce factual, quantitative and statistical inference. This is opposed to the statistic that focuses on analysis using standard techniques involving mathematical formulas and methods.

In addition to statistics, mathematics and computation, what are the others skills that a data scientist must have?

A data scientist must have skill sets to analyze and simplify problems using complex data sets to discover information.

Some approaches that the data scientist should know:

  • Apply scientific methods to troubleshooting using random data
  • Identifies data requirements for a particular problem
  • Identify techniques to obtain desired results
  • Provide value to organizations that use data

Data Science vs Data Mining

Data Mining is an activity that is part of a broader process of Knowledge Discovery in Databases (KDD), while Data Science is a field of study such as Applied Mathematics or Computer Science. Often, Data Science is considered in a broad sense, while Data Mining is considered a niche.

Some activities in Data Mining, such as statistical analysis, recording data flows, and pattern recognition, may cross-reference with Data Science. Therefore, Data Mining becomes a subset of Data Science.

Machine Learning in Data Mining is most used in pattern recognition, while in Data Science it has a more general use.

Note

Data Science and Data Mining should not be confused with Big Data Analysis and it is possible to have "Miners" and "Scientists" working on large data sets.


Data Science vs Machine Learning

Below is the difference between Data Science and Machine Learning:

Components - Data Science systems cover the entire life cycle of data and typically have components to cover the following items:

  • Collection and profiling - pipelines ETL (Extract Transform Load) and profiling

  • Distributed Computing - Horizontally Scalable Data Distribution and Processing

  • Intelligence Automation - Automated ML models for online responses (forecasting, recommendations) and fraud detection.
  • Data visualization - Visually explore data for better data intuition. The integral part of ML modeling.
  • Dashboards and BI - Predefined dashboards with slicing features and data for top-level stakeholders.
  • Data Engineering - Ensure that hot and cold data is always accessible. Covers data backup, security, disaster recovery
  • Deployment in Production Mode - Migrate the system to production with industry standard practices.
  • Automated Decisions - This includes running business logic over data or a complex mathematical model trained using any ML algorithm.

Machine Learning modeling begins with existing data and typical components are as follows::

Understand the problem - Make sure efficiently to solve the problem is ML. Note that not all problems can be solved using ML.

Explore Data - To get an intuition of resources to be used in the ML model. This may require more than one iteration. Data visualization plays a critical role here.

Prepare data - This is an important step with a high impact on the accuracy of the ML model. It deals with data issues like what to do with missing data for a resource? Replace with fictional value as zero, or average other values or delete the template resource. Dimensioning features, which ensure that all resource values are in the same range, are critical for many ML models. Many other techniques, such as polynomial resource generation, are also used here to derive new resources.

Select a model and train - The model is selected based on a type of problem (Forecast or rating etc.) and on the type of feature set (some algorithms work with a small number of instances with a large number of features and others in other cases).

Performance measure - In Data Science, performance measurements are not standardized, it will be changed on a case-by-case basis. Typically, it will be an indication of Data Timeliness, Data Quality, Querying Capability, Limits of Concurrency in Data Access, Interactive Visualization Capability, etc.


Data Science vs Artificial Intelligence

Both Data Science and Artificial Intelligence are popular choices in the market; Let’s discuss some of the main differences between Data Science and Artificial Intelligence:

  • Data Science is the collection and curation of mass data for analysis, while Artificial Intelligence is implementing this data into the Machine to understand this data.
  • Data Science is a collection of skills, such as statistical technique, while artificial intelligence algorithm technique.
  • Data science uses statistical learning, while artificial intelligence is machine learning.
  • Data Science observes a pattern in data for decision-making, while Ais seek an intelligent report for decision-making.
  • Data science is part of a loop of the cycle of perception and AI planning with action
  • In Data Science, processing is medium-level for data manipulation, while high-order scientific data processing for AI manipulation.
  • In data science, graphical representation is involved, while in the artificial intelligence algorithm and in the representation of the network node
  • Artificial intelligence technique involves for robotic control process while data science in data mining and manipulation.

Data Science vs Big Data

Below are some of the main differences between the concepts of big data and data science:

  • Organizations need big data to improve efficiency, understand new markets, and increase competitiveness, while data science provides the methods or mechanisms to understand and utilize the potential of big data in a timely manner.

  • Currently, for organizations, there is no limit to the amount of valuable data that can be collected, but to use all of this data to extract meaningful information for organizational decisions, data science is required.

  • Big data is characterized by its variety and volume of speeds (popularly known as 3Vs), while data science provides the methods or techniques for analyzing data characterized by 3Vs.

  • Big data provides the potential for performance. However, unearthing big data insight information to leverage its potential to improve performance is a significant challenge. Data science uses theoretical and experimental approaches beyond deductive and inductive reasoning. Takes responsibility for uncovering all the insightful hidden information from a complex network of unstructured data, supporting organizations to realize the potential of Big Data.

  • Big data analysis performs the mining of useful information from large volumes of data sets. Unlike analysis, data science makes use of machine learning algorithms and statistical methods to train the computer to learn without much programming to make predictions from large data. Therefore, data science should not be confused with big data analysis.

  • Big Data is more related to technology ( Hadoop , Java , Hive, etc.), distributed computing, and analysis tools and software. This is opposed to data science that focuses on strategies for business decisions, data dissemination using mathematics, statistics, and data structures and methods mentioned earlier.

Of the above differences between big data and data science, it can be noted that data science is included in the concept of big data. Data science plays an important role in many areas of application. Data science works with big data to gain useful insights through predictive analytics, where results are used to make smart decisions. Therefore, data science is included in big data, not the other way around.

References:

What is Data Science?

Comparisons Between Data Science vs Statistics

Useful Difference Between Data Science vs Machine Learning

Data Science Vs Data Engineering - Which One Is More Useful

Difference Between Data Science Vs Data Mining

Data Science vs Artificial > Intelligence

Big Data vs Data Science - How Are They Different> ?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.