### Data Science – [ Machine Learning] using R / Python and Visualization using Tableau

Detailed Topic Description:

1. Descriptive Statistics Introduction to the course Descriptive Statistics Probability Distributions
• Types of data
• Measures of Central Tendency
• Measures of Variance
• Probability Rules
• Probability Distributions: Normal Distribution/ Binomial Distribution/Poison Distribution
• Estimations and Proportions
1. Inferential Statistics Inferential Statistics through hypothesis tests Permutation & Randomization Test
• Hypothesis Testing Basics
• Error Types
• Hypothesis test for one sample population mean
• Hypothesis test for two sample population mean
• Paired T test
• Test for Variance / correlation
• Confidence Intervals
• Chi-sq test
1. Analysis of variance
• One way Analysis of Variance
• Two way Analysis of Variance
1. Basics of R Programming
• How to install R , R studio
• R Data Types , variables and operators
• R Data Frame , Basic functions in R
• Subletting, merging, recoding , aggregating , ordering , binding data in R
• User Defined functions
• Packages in R
• Apply(), Lapply(),Sapply(),tapply() R
• Missing Value Imputation in R
• Removing Outliers from data
1. Machine Learning: Introduction and Concepts Differentiating algorithmic and model based frameworks Regression: Ordinary Least Squares in R
• Difference between Supervised and Unsupervised Learning
• Visualization techniques
• To formulate simple and multiple regression models
• To give an account of the principle of least squares
• To carry out tests of linear hypothesis
• To perform validation of a regression model
• To perform Cross Validation / Stepwise Regression
• To select the important explanatory variables
• To use R for analyzing real data sets
• To be able to interpret the results in practical examples.
1. Binary Logistic Regression in R
• The log odds ratio Transformation
• Logistic models and Logit models
• Implementation of BLR in R
• Interpretation of results
• ROC curve
• Confidence Matrix
1. K nearest Neighbors Regression & Classification in R
• KNN algorithm
• Distance Measure Methods
• Maximum Vote Concept
• KNN Regression in R
• KNN Classification in R

8 . Decision Tree and Random Forest in R

• Classification Trees
• Regression Trees
• Regularization and pruning
• Ensemble Models
• Bagging / Boosting / Out of Bag Error
• Random Forest and Decision Tree implementation in R
1. Support Vector Machine
• Construction of SVM
• Support Vector and Hyperplanes
• Kernel Trick
• Hard vs Soft Margin SVM
• Implementation and Result Interpretation in R
1. Naïve Bayes
• Conditional Probability
• Bayes Theorem
• Implementation and Result Interpretation in R
1. Introduction To Artificial Neural Network
• Working of ANN
• Similarity between ANN and biological Neural System
• Single and Multilayer networks
1. Unsupervised Machine Learning
• Intra cluster and inter cluster analysis
• Hierarchical Clustering
• K means Clustering
1. Dimension Reduction Technique
• Principle Component Analysis
• Principle Components Regression
1. Association Rule Mining
• Market Basket Analysis
• Understanding Support, confidence, lift
• Implementation and interpretation of Market Basket Analysis in R

1. Introduction and Setting Up Your Integrated Analysis Environment
• Python Shell
• Custom environment settings
• Jupyter Notebooks
• Script editor
• Packages: NumPy, SciPy, scikit-learn, Pandas, Matplotlib, Seaborn, etc.
1. Using Python to Control and Document Your Data Science Processes
• Python Essentials
• Data types and objects
• Reading and writing data
• Simple plotting
• Control flow
• Debugging
• Code profiling
1. Accessing and Preparing Data
• Accessing SQL databases
• Cleansing Data with Python
• Stripping out extraneous information
• Normalizing data
• Formatting data

1. Numerical Analysis, Data Exploration, and Data Visualization with NumPy Arrays, Matplotlib, and Seaborn
• NumPy Essentials
• The NumPy array
• N-dimensional array operations and manipulations
• Memory mapped files
• Data Visualization
• 2D plotting with Matplotlib
• Advanced data visualization with Seaborn

19 . V. Exploring Data with Pandas

• Searching for Gold in a Pile of Pyrite
• Data manipulation with Pandas
• Statistical analysis with Pandas
• Time series analysis with Pandas
1. Machine Learning with scikit-learn
• Predicting the Future Can Be Good for Business
• Input: 2D, samples, and features
• Estimator, predictor, transformer interfaces
• Pre-processing data
• Regression
• Classification
• Model selection
1. Data Visualization using Tableau

For Further Details and Enrollment mail to [email protected]