User:LI AR/Books/Cracking the DataScience Interview
![]() | The Wikimedia Foundation's book rendering service has been withdrawn. Please upload your Wikipedia book to one of the external rendering services. |
![]() | You can still create and edit a book design using the Book Creator and upload it to an external rendering service:
|
| This user book is a user-generated collection of Wikipedia articles that can be easily saved, rendered electronically, and ordered as a printed book. If you are the creator of this book and need help, see Help:Books (general tips) and WikiProject Wikipedia-Books (questions and assistance). Edit this book: Book Creator · Wikitext Order a printed copy from: PediaPress [ About ] [ Advanced ] [ FAQ ] [ Feedback ] [ Help ] [ WikiProject ] [ Recent Changes ] |
Cracking the DataScience Interview
Basic Stuff To Know
- Generic pages
- Glossaire_de_l'exploration_de_données
- Big_data
- Inspired from books like:
- "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II"
- "120 real data science interview questions"
- Tips / Known Limits of DS
- DataScience is (very) experimental (Andrew Ng): https://pbs.twimg.com/media/CBXshmjWgAAgLKa.jpg
- Overfitting
- Bias–variance_tradeoff / http://www.ritchieng.com/machinelearning-learning-curve/
- Sampling_bias
- Survivorship_bias
- Selection_bias
- Concept_drift
- Correlation_does_not_imply_causation
- Curse_of_dimensionality
- Machine Learning definition and types
- Artificial_intelligence
- List_of_machine_learning_concepts
- Machine_learning
- Data_mining
- Knowledge_extraction
- Knowledge_extraction#Knowledge_discovery
- Pattern_recognition
- Signal_processing
- Supervised_learning
- Semi-supervised_learning
- Unsupervised_learning
- Reinforcement_learning
- Online_machine_learning
- Incremental_learning
- Q-learning
- One-shot_learning / https://www.quora.com/What-is-zero-shot-learning
- Feature_learning
- Learning_to_rank
- Similarity_learning
- Biclustering
- Natural_language_processing
- Biomimetics
- Collective_intelligence
- Data_stream_mining
- Sequential_pattern_mining
- Clickstream
- Semantics
- Semantic_Web
- Speech_recognition
- Speech_synthesis
- Collaborative_filtering
- Competitions
- https://www.kaggle.com/
- https://www.datascience.net/fr/home/
- http://dreamchallenges.org/
- https://www.drivendata.org/competitions/
- https://www.testdome.com/tests/data-analysis-test/65
- http://www.crowdanalytix.com/
- https://www.topcoder.com/community/data-science/
- https://www.datasciencechallenge.org/
- http://tunedit.org/challenges
- https://datasciencebowl.com/competitions/
- https://www.innocentive.com/ar/challenge/browse
- http://tamids.tamu.edu/2018-tamids-data-science-competition/
- https://hackerearth.com
- https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/
- http://www.kdnuggets.com/datasets/index.html
- https://aws.amazon.com/public-datasets/
- https://www.kaggle.com/datasets
- https://data.fivethirtyeight.com
- https://www.quandl.com/
- https://opendata.socrata.com/
- https://cloud.google.com/bigquery/public-data/
- https://github.com/BuzzFeedNews
- https://en.wikipedia.org/wiki/Wikipedia:Database_download
- http://mlr.cs.umass.edu/ml/datasets.html
- https://data.world/
- https://www.data.gov/
- https://www.data.gouv.fr/fr/
- https://data.worldbank.org/
- https://www.reddit.com/r/datasets/top/?sort=top&t=all
- http://academictorrents.com/browse.php?cat=6
- http://www.kdnuggets.com/2015/04/awesome-public-datasets-github.html
- http://www.kdnuggets.com/?s=datasets
- https://www.springboard.com/blog/free-public-data-sets-data-science-project/
- https://www.dataquest.io/blog/free-datasets-for-projects/
- https://github.com/awesomedata/awesome-public-datasets
- https://elitedatascience.com/datasets
- https://blog.journeyofanalytics.com/50-free-datasets-for-data-science-projects/
- https://www.datascienceweekly.org/data-science-resources/data-science-datasets
- Software
- http://www.databaseetl.com/data-mining-tools/
- IDEs / DS-GUI
- R
- (DS-GUI) :Rattle_GUI http://rattle.togaware.com/
- (IDE) :RStudio https://www.rstudio.com
- Python
- Java
- Online
- Paid Software
- (DS-GUI) :Minitab https://minitab.com/
- (DS-GUI) :Tableau_Software https://www.tableau.com/
- R
- R/Packages
- https://cran.r-project.org/
- https://cran.r-project.org/web/views/
- https://cran.r-project.org/web/views/MachineLearning.html
- https://cran.r-project.org/web/views/Bayesian.html
- https://cran.r-project.org/web/views/Cluster.html
- https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
- https://cran.r-project.org/web/views/Survival.html
- https://cran.r-project.org/web/views/TimeSeries.html
- Python
- C++
- Alteryx
- https://www.alteryx.com/ [Commercial]
- Comparison
- DeepLearning
- GANs (Generative Adversial Networks)
- DataViz
- https://matplotlib.org/
- https://plot.ly/
- :GGobi http://www.ggobi.org/
- http://ggplot2.org/
- http://ggvis.rstudio.com/
- https://d3js.org/
- https://datascienceplus.com/creating-graphs-with-python-and-goopycharts/
- https://www.tableau.com/ [Commercial]
- http://bokeh.pydata.org/en/latest/ [Python]
- http://pyqtgraph.org/ [Python]
- https://uber.github.io/deck.gl [Uber's internal DataViz tool]
- http://rawgraphs.io/
- http://scidavis.sourceforge.net/
- http://home.gna.org/veusz/
- http://jwork.org/dmelt/
- Graphs
- GUI
- Data Manipulation
- Annotate examples: https://prodi.gy/
- Data_pre-processing
- Data_cleansing
- Data_reduction
- Data_wrangling
- Data_scrubbing
- Data_editing
- Data_scraping
- Data_curation
- Data_pre-processing
- Data_fusion
- Data_integration
- Data_binning
- Sanitization_(classified_information)
- Extract,_transform,_load
- Imputation_(statistics)
- Interpolation
- Outlier
- https://github.com/Quartz/bad-data-guide
- https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
- Local_case-control_sampling#Imbalanced_datasets
- Sampling_(statistics)
- Sampling_(statistics)#Stratified_sampling
- Stratified_sampling
- Jackknife_resampling
- Oversampling_and_undersampling_in_data_analysis
- Oversampling_and_undersampling_in_data_analysis#SMOTE
- AdaBoost
- "Essay Why Most Published Research Findings Are False"
- "A Few Useful Things to Know about Machine Learning"
- Working with text
- Unicode_equivalence#Normalization
- URL_normalization
- Text_segmentation
- N-gram
- Tokenization_(lexical_analysis)
- Stemming
- Word2vec https://www.tensorflow.org/tutorials/word2vec
- https://google.github.io/seq2seq/
- NLP in Python
https://github.com/explosion/thinc
- Working with spatial data
- Spatial_data
- Trend_surface_analysis
- Variogram
- Geary's_C
- Moran's_I
- Spatial_descriptive_statistics#Ripley.27s_K_and_L_functions
- Signal processing
- Signal processing - Images
- Techniques for Feature/Attribute Selection/Dimensionality Reduction
- High-dimensional_statistics
- Dimensionality_reduction
- Factor_analysis
- Principal_component_analysis
- Independent_component_analysis
- Singular_value_decomposition
- Multidimensional_scaling
- T-distributed_stochastic_neighbor_embedding
- Autoencoder
- Deep_learning#Stacked_.28de-noising.29_auto-encoders
- Elastic_map
- Linear_discriminant_analysis
- Signal processing
- Working with spatial data
- Maths (Stats / Algebra)
- Inspiration for this section: https://github.com/soulmachine/machine-learning-cheat-sheet
- Pseudo-random_number_sampling
- Glossary_of_probability_and_statistics
- Bijection,_injection_and_surjection
- Mean
- Harmonic_mean
- Median
- Mode_(statistics)
- Range_(mathematics)
- Quartile
- Interquartile_range
- Variance
- Covariance
- Standard_deviation
- Collinearity#Usage_in_statistics_and_econometrics
- ANOVA
- ANCOVA
- MANOVA
- ANORVA
- Moving_average
- EWMA_chart
- Exponential_smoothing
- Autoregressive_model
- Autoregressive–moving-average_model
- Autoregressive_integrated_moving_average
- Autocorrelation
- Cross-correlation
- Entropy_in_thermodynamics_and_information_theory
- Moment_(mathematics)
- Residual
- Expected_value
- Likelihood_function
- Cumulative_distribution_function
- Probability
- Probability_mass_function
- Probability_density_function
- Prior_probability
- Prior_knowledge_for_pattern_recognition
- Permutation https://fr.wikipedia.org/wiki/Arrangement
- Combination https://fr.wikipedia.org/wiki/Combinaison_(math%C3%A9matiques)
- Dependent_and_independent_variables
- Independence_(probability_theory)
- Hoeffding's_inequality
- Pareto_efficiency
- Nash_equilibrium
- Pareto_principle
- Tensor
- Tensor_product
- Cross_product
- Taxicab_geometry
- Norm_(mathematics)#Euclidean_norm
- Lp_space
- Norm_(mathematics)
- Determinant
- Trace_(linear_algebra)
- Eigenvalues_and_eigenvectors
- Projection_(mathematics)
- Curvature
- Convolution
- Hadamard_product_(matrices)
- Kernel_(statistics)
- Radial_basis_function
- Logit
- Latent_variable
- Inference
- Statistical_inference
- Inductive_reasoning
- Deduction_and_induction
- Transduction_(machine_learning)
- Stochastic
- Stochastic_process
- Probability_theory
- Probability
- Posterior_probability
- Statistic
- Statistics
- Gaussian_noise
- Bayesian_inference
- Bayes_rule
- Bayes'_theorem
- Bayesian_network
- Naive_Bayes_spam_filtering
- Naive_Bayes_classifier
- Belief_propagation#Approximate_algorithm_for_general_graphs
- Loss_function
- Regularization_(mathematics)
- Normalization_(statistics)
- Quantile_normalization
- Nyström_method (+PCA)
- Preference_(economics)
- Delaunay_triangulation
- Neighbourhood_(mathematics)
- Genetic Algorithms
- Mutation_(genetic_algorithm)
- Crossover_(genetic_algorithm)
- Selection_(genetic_algorithm)
- Fitness_function
- Utility#Utility_functions
- SVM
- Neural Networks
- Rectifier_(neural_networks)
- Backpropagation
- Gradient
- Gradient_descent
- Stochastic_gradient_descent
- Gradient_boosting
- http://www.wildml.com/deep-learning-glossary/#gradient-clipping
- http://www.wildml.com/deep-learning-glossary/#batch-normalization
- http://www.wildml.com/deep-learning-glossary/#backpropagation
- http://www.wildml.com/deep-learning-glossary/#momentym
- http://www.wildml.com/deep-learning-glossary/#sgd
- https://visualstudiomagazine.com/articles/2015/07/01/variation-on-back-propagation.aspx
- Softmax is a "discriminant learning metric": examples for all classes!={i} help learn even for class {i} since sum of evaluations is forced to be 1 (the method creates a link in the evaluations of the classes)
- Sigmoid_function
- Hyperbolic_function#Tanh
- Dropout_(neural_networks)
- Radial_basis_function
- Hebbian_theory
- Signal processing
- Signal_processing
- Low-pass_filter
- High-pass_filter
- Energy_(signal_processing)
- Fast_Fourier_transform
- Wavelet
- Discrete_wavelet_transform
- Coherence_(signal_processing)
- Kalman_filter
- Time Series
- Time_series
- Decomposition_of_time_series
- Seasonal_adjustment
- Seasonality
- Frequency_domain
- Time_domain
- Spectral_density
- Games
- Distances
- Distance
- Euclidean_distance [dim1]
- Edit_distance
- Hamming_distance
- Manhattan_distance [dim1]
- Levenshtein_distance
- Needleman–Wunsch_algorithm
- Minkowski_distance [dim n == generalization]
- Mahalanobis_distance
- Canberra_distance
- Distance_correlation
- Angular_distance
- String_metric
- Jaro–Winkler_distance
- Jaccard_index
- Kendall_tau_distance
- Chebyshev_distance
- Tf–idf
- Neural_coding
- For graphs: http://blog.smola.org/post/33412570425
- https://fr.wikipedia.org/wiki/Algorithme_de_Needleman-Wunsch
- Clouds
- Hausdorff_distance [between clouds of points, a point and a cloud]
- Distance#Distances_between_sets_and_between_a_point_and_a_set
- Distributions
- Discrete_uniform_distribution
- Normal_distribution
- Bernoulli_distribution
- Binomial_distribution
- Poisson_distribution
- Chi-squared_distribution
- Log-normal_distribution
- Pareto_distribution
- Chi-squared_distribution
- Gibbs_distribution
- Weibull_distribution
- Gamma_distribution
- Beta_distribution
- Hypergeometric_distribution
- Dirac_delta_function
- https://ercim-news.ercim.eu/en107/special/robust-and-adaptive-methods-for-sequential-decision-making [Characterization of the simplicity of a distribution: BernsteinExponent+TsybakovMarginCondition]
- Evaluation
- Performance_indicator
- Mean_absolute_percentage_error
- Mean_absolute_scaled_error
- Symmetric_mean_absolute_percentage_error
- Regression-kriging
- https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError
- http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html
- http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
- Information_gain_ratio
- Kullback–Leibler_divergence
- Gini_coefficient
- Pearson_correlation_coefficient
- Entropy
http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/node15.html
- Akaike_information_criterion https://twitter.com/DataSciFact/status/963129411250933760
- Bayesian_information_criterion
- Brier_score == RMSE
- Structural_similarity
- Type_I_and_type_II_errors
- False_positive_rate
- False_coverage_rate
- False_discovery_rate
- Confusion_matrix
- Accuracy_and_precision
- Precision_and_recall
- F1_score
- Sensitivity_and_specificity
- Receiver_operating_characteristic
- Receiver_operating_characteristic#Area_under_the_curve
- Discounted_cumulative_gain
- Cross-validation_(statistics)
- Errors_and_residuals
- If residual is consistantly >0 or <0 on a range of the training set => the model has failed to capture something in the data or we use wrong type of model (e.g. linear reg on parabolic data; DataSkeptic/Heteroskedasticity)
- Clustering
- See also the Calinski-Harabasz Index: http://stats.stackexchange.com/questions/97429/intuition-behind-the-calinski-harabasz-index
- Others
- Working with Text
- Part_of_speech
- Semantic_similarity
- Tf–idf
- Cosine_similarity
- Okapi_BM25
- See also Mr Gomez page on Weka: http://www.esp.uem.es/jmgomez/tmweka/
- Named-entity_recognition
- Conditional_random_field
- Latent_Dirichlet_allocation
- Sentiment_analysis
- Web_mining
- Web_crawler
- Text_mining
- Document_classification
- Automatic_summarization
- Working with Images
- http://mirror.imagej.net/plugins/mexican-hat/index.html
- If your model seeks to penalize near misses, the Mexican hat function is a good choice.
- Working with concepts (Ontologies)
https://en.wikipedia.org/wiki/YAGO_%28database%29 http://wiki.dbpedia.org/ http://conceptnet.io/ http://cogcomp.org/Data/QA/QC/definition.html
- Visualization
- Data_visualization
- Exploratory_data_analysis
- List_of_graphical_methods
- Category:Statistical_charts_and_diagrams
- Statistical_graphics
- Visual_perception
- Heat_map
- Misleading_graph
- Pareto_chart
- Need to develop "critical thinking":
- (Statistical) tests
- A/B_testing
- Evaluating an hypothesis
- Statistical_power
- Statistical_hypothesis_testing
- P-value
- Student's_t-test
- Chi-squared_test
- Type_I_and_type_II_errors
- Detecting abrupt changes in time series
- Stationary_process
- Structural_break
- Chow_test
- Kruskal–Wallis_one-way_analysis_of_variance
- F-test
- F-statistics
- Pairwise_summation
- CUSUM
- MOSUM: https://cran.r-project.org/web/packages/strucchange/vignettes/strucchange-intro.pdf
- Time series / Chaos
- Machine Learning Techniques
- Statistical_classification
- One-class_classification
- Binary_classification
- Multiclass_classification
- Multi-label_classification
- Structured_prediction
- Cluster_analysis
- Elbow_method_(clustering)
- Nearest_neighbor_search#Approximate_nearest_neighbor
- Regression_analysis
- Linear_regression
- Logistic_regression
- Ridge_regression
- Kriging
- Multivariate_adaptive_regression_splines
- Association_rule_learning
- Apriori_algorithm
- Survival_analysis
- Monte_Carlo_method
- Monte_Carlo_algorithm
- Multinomial_logistic_regression
- Lasso_(statistics)
- Expectation–maximization_algorithm
- Markov_chain_Monte_Carlo
- Hidden_Markov_Models
- Viterbi_algorithm
- Convolutional_code
- Forward–backward_algorithm
- Markov_random_field
- Mean_field_theory
- Mean_field_particle_methods
- CART
- Decision_tree_learning
- Decision_tree
- Pruning_(decision_trees)
- ID3_algorithm
- C4.5_algorithm
- Random_forest
- Support_vector_machine
- Support_vector_machine#Support_vector_clustering_.28SVC.29
- Support_vector_machine#Regression
- Conditional_random_field
- Latent_semantic_analysis
- Genetic_algorithm
- Evolutionary_algorithm
- Evolutionary_computation
- Voronoi_diagram
- Local_outlier_factor
- Ordered_weighted_averaging_aggregation_operator
- Support_vector_machine
- Neural Networks
- History: http://www.chronicle.com/article/The-Believers/190147/
- The various types of NN as a picture: http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png
- Types_of_artificial_neural_networks
- Comparison_of_deep_learning_software/Resources
- Artificial_neural_network
- Perceptron
- Feedforward_neural_network
- Multilayer_perceptron
- Radial_basis_function_network
- Long_short-term_memory
- SNNS
- Time_delay_neural_network
- Recursive_neural_network
- Recurrent_neural_network
- Hopfield_network
- Content-addressable_memory
- Boltzmann_machine
- Self-organizing_map
- Learning_vector_quantization
- Long_short-term_memory
- Liquid_state_machine
- Autoassociative_memory
- Convolutional_neural_network
- Autoencoder
- Neuroevolution
- Neuroevolution_of_augmenting_topologies
- Deep_learning
- Deep_learning#Deep_neural_network_architectures
- Deep_belief_network
- Generative_adversarial_networks
- Signal Processing
- Fuzzy Logic
- Fuzzy_logic
- Inference_engine
- Fuzzy_logic
- Type-2_fuzzy_sets_and_systems
- T-norm_fuzzy_logics
- Adaptive_neuro_fuzzy_inference_system
- Fuzzy_control_system
- Working with spatial data
- Ensemble Techniques
- Ensemble Learning = Boosting, Bagging or Stacking: http://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning#19053
- Applying Bagging should help reduce variance and overfitting.
- Applications
- Bayesian_spam_filtering
- Root_cause_analysis
- Inpainting
- Experimentation framework
- Goal: test various parameters on various algorithms to determine the best model(s)
- Weka's "Experimenter" mode: http://weka.sourceforge.net/manuals/ExplorerGuide.pdf
- AutoWeka: http://www.cs.ubc.ca/labs/beta/Projects/autoweka/
- R::mlrMBO: https://github.com/mlr-org/mlrMBO
- Coding / Exposing API to the rest of the application
- Microservices
- Map-Reduce framework
- Scrapping
- Storage
- Apache_Hadoop#HDFS https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
- Apache_HBase http://hbase.apache.org/
- Apache_Hive https://hive.apache.org/
- Transfers - to/from RelationalDB
- Transfers - serialization/streaming
- Storage - In memory
- Admin
- Apache_ZooKeeper http://zookeeper.apache.org/
- Apache_Cassandra https://cassandra.apache.org
- Ambari http://ambari.apache.org/
- Apache_Oozie http://oozie.apache.org/
- Programming
- ML
- Working with text
- Working with text - Data Viz
- Small/Micro Data
- Multi-Agent Systems
- Agent-based_model
- Multi-agent_system
- Agent-oriented_software_engineering
- https://www.researchgate.net/publication/266182243_Agent_Groupe_Role_et_Service_Un_modele_organisationnel_pour_les_systemes_multi-agents_ouverts [JFerber: AGR Methodology]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.7968&rep=rep1&type=pdf [YDemazeau: Vowels Methodology]
- Quantum Machine Learning
- Quantum_machine_learning
- Quantum_tunnelling
- Quantum_annealing
- Adiabatic_quantum_computation
- Resources
- http://www.wildml.com/deep-learning-glossary/
- http://deeplearning.net
- https://www.datacamp.com
- http://www.learnpython.org
- https://www.codecademy.com/learn/python
- http://www.dataschool.io/how-to-get-better-at-data-science/
- http://simplystatistics.org/2015/03/17/data-science-done-well-looks-easy-and-that-is-a-big-problem-for-data-scientists/
- Social network for DataScientists
- Books
https://github.com/janishar/mit-deep-learning-book-pdf
- http://neuralnetworksanddeeplearning.com/
- http://deeplearning.net/tutorial/deeplearning.pdf
- https://cours.etsmtl.ca/sys843/REFS/Books/ebook_Haykin09.pdf
- http://hagan.ecen.ceat.okstate.edu/nnd.html
- http://www.dkriesel.com/en/science/neural_networks
- https://torres.ai/research-teaching/tensorflow/first-contact-with-tensorflow-book/
- https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf
- http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
- http://www.greenteapress.com/thinkstats/thinkstats.pdf
- http://www.greenteapress.com/thinkbayes/thinkbayes.pdf
- http://www.greenteapress.com/thinkpython/thinkpython.pdf
- http://r4ds.had.co.nz/
- https://web.stanford.edu/~hastie/Papers/ESLII.pdf
https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print10.pdf
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
http://infolab.stanford.edu/~ullman/mmds/booka.pdf
http://www.guidetodatamining.com/assets/guideChapters/Guide2DataMining.pdf
https://github.com/ajaymache/machine-learning-yearning
- Paid Books
- "Artificial Intelligence for Humans, Volume 1: Fundamental Algorithms", Jeff Heaton, 2013, ISBN:9781493682225
- "Artificial Intelligence for Humans, Volume 2: Nature-Inspired Algorithms", Jeff Heaton, 2014, ISBN: 978-1499720570
- "Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks", Jeff Heaton, 2015, ISBN: 978-1505714340
- "Introduction to Machine Learning (Adaptive Computation and Machine Learning)", E. Alpaydin, MIT Press, 2004, ISBN: 978-0262012430
- "Machine Learning: An Artificial Intelligence Approach", R.S. Michalski, J.G. Carbonell, T.M. Mitchell, Symbolic Computation, 1983, ISBN:978-3540132981
- "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II", Antonio Gulli, CreateSpace, 2015, ISBN:978-1517216719
- "Artificial Intelligence a Modern Approach", Stuart Russell and Peter Norvig, Prentice Hall, 1995, ISBN:978-0131038059
- "An Introduction to MultiAgent Systems", Michael Wooldridge, John Wiley & Sons, 2009 (2nd ed), ISBN:978-0470519462
- "Data Mining: Practical Machine Learning Tools and Techniques", Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, Morgan Kaufmann, ISBN:978-0128042915
- "Agent Intelligence Through Data Mining", Andreas L. Symeonidis, Pericles A. Mitkas, Springer/Apress, ISBN:978-0387257570
- "Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence", Gerhard Weiss, 2000, ISBN:978-0262232036
- "Data science at the command line", Janssens, O'Reilly.
- Also look for MachineLearning, DeepLearning, Spark, Mahout, R, Python, SciKit-Learn, Data/Text Mining, ElasticSearch, Natural Language, Statistics @ O'Reilly, Packt, Manning/In Action, HeadFirst
- Lists of good books
- News/Blogs/RSS
- https://blog.acolyer.org/
- https://www.reddit.com/r/machinelearning
- https://www.reddit.com/r/statistics
- https://www.reddit.com/r/datascience
- https://www.reddit.com/r/bigdata
- http://www.kdnuggets.com/
- http://www.becomingadatascientist.com/
- https://rdatamining.wordpress.com/
- http://www.r-bloggers.com/
- https://dataaspirant.com/
- http://www.joyofdata.de/blog/
- https://www.dataiku.com/blog/
- https://www.datacamp.com/community/
- http://beautifuldata.net/
- http://www.datatau.com/news
- http://dataelixir.com/
- http://www.oreilly.com/data/newsletter.html
- http://blog.kaggle.com/
- http://blog.yhathq.com/
- http://simplystatistics.org/
- http://fastml.com/
- http://www.win-vector.com/blog/
- http://fivethirtyeight.com/
- http://www.dataschool.io/
- https://research.facebook.com/blog/datascience/
- http://deeplearning.net/feed/
- http://learningwithdata.com/
- http://blog.plot.ly/
- https://datasciencelab.wordpress.com/
- https://shapeofdata.wordpress.com/
- http://datalab.lu/
- http://www.pythonweekly.com/
- http://pbpython.com/
- https://plus.google.com/communities/105141578068503684401 ( https://plus.google.com/+JaanaNystr%C3%B6m/posts/MKCV3vNsn1g )
- http://blog.revolutionanalytics.com/2012/12/the-most-influential-data-scientists-on-twitter.html
- http://www.kdnuggets.com/2012/12/most-influential-data-scientists-on-twitter.html
- https://journal.r-project.org/
- Podcasts
- http://www.learningmachines101.com/
- http://www.thetalkingmachines.com/
- http://dataskeptic.com/
- http://www.partiallyderivative.com/
- http://www.ocdqblog.com/podcast/
- http://blog.pivotal.io/podcasts-pivotal
- https://www.udacity.com/podcasts/linear-digressions
- http://datastori.es/
- http://radar.oreilly.com/tag/oreilly-data-show-podcast
- http://freakonomics.com/radio/freakonomics-radio-podcast-archive/
- http://simplystatistics.org/category/podcast/
- http://data-informed.com/multimedia/podcasts/
- http://www.bbc.co.uk/programmes/p02nrss1
- YT Channels
- https://www.youtube.com/user/keeroyz
- https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A
- https://www.youtube.com/channel/UCioEIe1o73G-oGR4b34E7Dg
- https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg
- https://www.youtube.com/channel/UC9LfrPNcIyHspci0t2W4T_w
- https://www.youtube.com/channel/UCHBWJGoZMkhJyElgvuN1U1w
- https://www.youtube.com/user/dataschool
- https://www.youtube.com/channel/UCtY8JjMQpzYb5FFvUr2JnUw
- https://www.youtube.com/channel/UCRhUp6SYaJ7zme4Bjwt28DQ
- https://www.youtube.com/user/sentdex
- https://www.youtube.com/user/DataScienceDojo
- MOOCs
- Generic
- Weka
- Andrew Ng
- Yann Lecun
- Ans Rosling (visualization)
- From renown Universities
- https://www.coursera.org/specializations/jhu-data-science
- https://www.coursera.org/specializations/machine-learning
- https://www.coursera.org/specializations/data-science-python
- https://www.coursera.org/specializations/big-data
- https://www.coursera.org/learn/machine-learning
- https://www.coursera.org/learn/r-programming
- https://www.coursera.org/learn/data-scientists-tools
- https://www.coursera.org/learn/python-data-analysis
- http://www.holehouse.org/mlclass/
- http://online.stanford.edu/course/statistical-learning
- http://work.caltech.edu/telecourse.html
- https://www.udacity.com/course/data-analyst-nanodegree--nd002
- https://www.thinkful.com/courses/learn-data-science-online/
- https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x7
- https://www.coursetalk.com/
- https://github.com/justmarkham/DAT7#bonus-resources
- http://datasciencemasters.org/
- http://www.wolfram.com/broadcast/c?c=99
- http://www.wolfram.com/broadcast/c?c=97
- http://www.wolfram.com/broadcast/c?c=397
- DataSchool
- Jobs
- https://datajobs.com/
- http://www.analytictalent.com/
- http://www.kdnuggets.com/jobs/index.html
- https://fr.hired.com/
- Teaching
http://edison-project.eu/edison/edison-data-science-framework-edsf
- Curated list of similar pages
https://github.com/search?utf8=%E2%9C%93&q=curated+list+awesome+frameworks&type= https://github.com/josephmisiti/awesome-machine-learning https://github.com/onurakpolat/awesome-bigdata https://github.com/onurakpolat/awesome-analytics https://github.com/analyticalmonk/awesome-neuroscience https://github.com/igorbarinov/awesome-data-engineering https://github.com/quantmind/awesome-data-science-viz https://github.com/fasouto/awesome-dataviz https://github.com/qinwf/awesome-R https://github.com/datascience-python/awesome-datascience-python https://github.com/caesar0301/awesome-public-datasets