Common Data Science Techniques :
work flow of D.S.
Raw data -> processing -> information -> Business Intelligence analysis
Techniques are:
1) Traditional Data techniques:
Data collection -> preprocessing:
a) class labelling (categorical v/s numerical) -> Categorical data can be stored and identified by names or labels. Numerical data are numbers, not words or descriptions
b) Data cleaning -> the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled.
c) Dealing with missing value -> the first step in handling missing values is to carefully look at the complete data and find all the missing values.
d) Balancing & shuffling dataset -> a dataset where each output class (or target class) is represented by the same number of input samples. Balancing can be performed by exploiting one of the following techniques: oversampling, under sampling, class weight
2) Big Data Techniques:
Data Collection -> preprocessing:
a) class labelling ( image, number, text, audio, video)
b) Data cleaning ( above mentioned in data cleaning)
c) Dealing with missing values( Null, NAN values) : ( above mentioned in data cleaning)
d) Data mining -> the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis. Data mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions.
e) Data masking -> is a way to create a fake, but a realistic version of your organizational data. The goal is to protect sensitive data, while providing a functional alternative when real data is not needed—for example, in user training, sales demos, or software testing
3) Business Intelligence Techniques: Analyze the data extract info and present it in the from of performance or progress.
a) Matrics -> (measures + business meaning)
b) KPIs(Key performance Indicator) -> (business objective + matrics)
c) Reports->Formal documents which can include headings, sub-headings, numbered sections, bullet point text, and graphics such as flow charts, diagrams or graphs. All of these devices may be used to help the reader navigate the report and understand its content.
d) Dashboard -> A data dashboard is an interactive tool that allows you to track, analyze, and display KPIs and metrics. Modern dashboards allow you to combine real-time data from multiple sources and provide you AI-assisted data preparation, chart creation, and analysis.
4) Traditional Method : Techniques:
-> Predictive Analysis: is the process of using data to forecast future outcomes. The process uses data analysis, machine learning, artificial intelligence, and statistical models to find patterns that might predict future behavior.
Recommended by LinkedIn
a) Regression -> Linear Regression : Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.
b) Logical Regression -> This type of statistical model (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring, such as voted or didn't vote, based on a given dataset of independent variables.
c) Clustering -> grouping the observation together.
d) Factor Anaysis -> grouping explotary variable together.
e) Time Series ->Time series regression is a statistical method for predicting a future response based on the response history (known as autoregressive dynamics) and the transfer of dynamics from relevant predictors.
5) Machine Learning Technique : creating an algorihm which a computer then uses to find a model that fit the data as but as possible, and make very accurate prediction based on that data
flow of ML ==>
Data --> Model --> objective function --> optimization Algorithm
Types of ML ==> a) Supervised Learning -> training an algorithm resemble a teacher supervising her students.
b) Unsupervised learning -> is a type of machine learning (ML) technique that uses artificial intelligence (AI) algorithms to identify patterns in data sets that are neither classified nor labeled.
c) Reinforcement Learning -> developers devise a method of rewarding desired behaviors and punishing negative behaviors. This method assigns positive values to the desired actions to encourage the agent to use them, while negative values are assigned to undesired behaviors to discourage them.
Hope this will help to start #datascience
Codebasics thank to Dhaval Patel Hemanand Vadivel