Table of Contents generated withDocToc
- DataScience & Machine Learning Reference
- Introduction & Overview:入门与概览
-
- Collections:资源汇总帖
- Video Courses:视频教程
- Blogs & Forum:博客与论坛
-
- Data Process:数据处理
- Machine Learning:机器学习
- Nature Language Processing:自然语言处理
- Deep Learning:深度学习
-
- Recommend System:推荐系统
- CrawlerSE:爬虫与搜索引擎
- Search Engine:搜索引擎
- Data Visual:数据可视化
-
- Collections:资源汇总帖
- 跨学科数据库与搜索引擎
- Social Network:社交网络
- Driving Data:驾驶数据
- Collections:资源汇总帖
-
- Competition:机器学习相关竞赛
DataScience & Machine Learning Reference
本文是笔者在学习DataScience过程中所有资源的汇总,本文着眼于各个领域的入门介绍以及综述性质资源的汇总,并不会过多的深挖前沿,若有兴趣了解更多,可以关注笔者的 程序猿的数据科学与机器学习实战手册 。本文主线从对数据科学与机器学习入门概览开始,继而提供一系列的资源、书籍与教程,然后介绍各个具体的领域内的参考文章,最后介绍一系列的实用工具。笔者的数据科学与机器学习世界观图解如下,其从属于笔者的编程世界观与方法论系列:
本文会随着笔者自身学习实践中格局与能力的提升而不断完善,笔者并非纯粹的机器学习与数据挖掘研究者,更多的是从工程的角度来寻找能够与工程相结合应用的方面。
Introduction & Overview:入门与概览
Introduction
Machine Learning
- Visual Intro To Machine Learning :图解如何基于决策树对于纽约与San Francisco的房产进行分类
- A Gentle Guide to Machine Learning
- Machine Learning basics for a newbie
- What is machine learning, and how does it work?
Deep Learning
- 有趣的机器学习概念纵览:从多元拟合,神经网络到深度学习,给每个感兴趣的人
-
[翻译] 神经网络的直观解释 :卷积神经网络的讲解非常通俗易懂。
-
Deep-Learning-Papers-Reading-Roadmap :为每个对深度学习感兴趣的朋友整理的论文阅读路线图
-
程序员的深度学习入门指南 :来自费良宏在2016QCon全球软件开发大会(上海)上的演讲。
Statistics
News:行业与新闻
Application:数据挖掘/机器学习/深度学习的实际应用案例
Resources:资源
Collections:资源汇总帖
- 机器学习入门资源不完全汇总 :本文是 机器学习日报的一个专题合集。
- Top-down learning path: Machine Learning for Software Engineers :针对软件工程师的机器学习进阶之路
Books:书籍
- 2014 - DataScience From Scratch
- 2012 - 李航:统计方法学
- 2015 - Data Mining, The Textbook
- 2016 - 周志华 机器学习
- 2012 - Machine Learning A Probabilistic Perspective
- 2012 - 深入浅出机器学习 中文版
- 南京大学计算机科学与技术系 数据挖掘课程
Video Courses:视频教程
- University of Illinois at Urbana-Champaign:Text Mining and Analytics
- 台大 机器学习技法
- 斯坦福 机器学习课程
- CS224d: Deep Learning for Natural Language Processing
-
Unsupervised Feature Learning and Deep Learning :来自斯坦福的无监督特征学习与深度学习系列教程
- 小象 深度学习视频教程
Blogs & Forum:博客与论坛
Methodology:方法论
Data Process:数据处理
Machine Learning:机器学习
Nature Language Processing:自然语言处理
Deep Learning:深度学习
Application:应用
Recommend System:推荐系统
CrawlerSE:爬虫与搜索引擎
Crawler:爬虫
Search Engine:搜索引擎
Toolkits:工具
Language
Python
- Jupyter :交互式编程与数据展示
- data-science-ipython-notebooks :一系列基于IPython的数据科学代码展示
- The Open Source Data Science Masters
Java
Matlab
R
ClusterComputing
- Madout
- MLib ## DeepLearning:深度学习工具集
- Evaluation of Deep Learning Toolkits
- 代码解析深度学习系统编程模型:TensorFlow vs. CNTK
- tensorflow-playground :Play with neural networks!
- dl-docker:将常用的深度学习工具打包在了一个Docker镜像中
- deep-learning-models:Keras code and weights files for popular deep learning models.
- Top Deep Learning Projects -
Data Visual:数据可视化
Books:书籍
Video Courses:视频教程
Toolkits:工具
Data Sets
Collections:资源汇总帖
- awesome-public-datasets :An awesome list of high-quality open datasets in public domains (on-going).
- Wikimedia Dumps :Wiki上的数据打包下载
- Reddit Datasets :Reddit上关于数据集的讨论板块 | Militarized Interstate Disputes | Nearly 200 years of international threats, conflicts, etc. for modelling or prediction. Includes action taken, level of hostility, fatalities, and outcomes. | Multiple datasets, e.g., 962KB, 179KB | http://www.correlatesofwar.org/data-sets/MIDs |
单一数据库
- http://archive.ics.uci.edu/ml/
- http://crawdad.org/
- http://data.austintexas.gov
- http://snap.stanford.edu/data/index.html
- http://data.cityofchicago.org
- http://data.govloop.com
- http://data.gov.uk/data.gov.in
- http://data.medicare.gov
- http://www.dados.gov.pt/pt/catalogodados/catalogodados.aspx
- http://data.sfgov.org
- http://data.sunlightlabs.com
- https://datamarket.azure.com/
- http://econ.worldbank.org/datasets
- http://gettingpastgo.socrata.com
- http://public.resource.org/
- http://timetric.com/public-data/
- http://www.bls.gov/
- http://www.crunchbase.com/
- http://www.dartmouthatlas.org/
- http://www.data.gov/
- http://www.datakc.org
- http://dbpedia.org
- http://www.factual.com/
- http://www.freebase.com/
- http://www.infochimps.com
- http://build.kiva.org/
- http://www.imdb.com/interfaces
- http://knoema.com
- http://daten.berlin.de/
- http://www.qunb.com
- http://databib.org/
- http://datacite.org/
- http://data.reegle.info/
- http://data.wien.gv.at/
- http://data.gov.bc.ca
跨学科数据库与搜索引擎
- https://www..com/datasets
- http://usgovxml.com
- http://aws.amazon.com/datasets
- http://databib.org
- http://datacite.org
- http://figshare.com
- http://linkeddata.org
- http://thewebminer.com/
- http://thedatahub.org
- http://ckan.net
- http://quandl.com
- Open Data Inception(这里有 2500+ 开源接口)
Text:文本
- 20 Newsgroups :The text from 20000 messages taken from 20 Usenet newsgroups for text analysis, classification, etc. 61.6MB
- Amazon Reviews :Over 142 million product reviews for sentiment analysis, recommender systems, and more.20GB | SMS Spam Collection | A collection of 5,574 SMS (text) messages, some spam, some normal, for spam filtering. | 204KB | http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/ |
Social Network:社交网络
- http://enigma.io
- http://www.ufindthem.com/
- http://NetworkRepository.com(有视觉互动分析的机器学习数据库)
- http://MLvis.com
- Yahoo Instant Messenger Friends Connectivity Graph :Connections between Yahoo users who communicate with each other using Yahoo messenger, can be used to identify key social contacts/influencers. Add dataset to cart to access. 共 28MB。
Media:影音图片
- Labeled Faces in the Wild :13,000 named faces for facial recognition. Multiple training and test sets. 共173MB
- Mushroom Identification :For hypothetically classifying mushrooms as edible or poisonous based on its characteristics.3 files, 480KB
- NORB 3D Object Recognition :Binocular images of 50 toy figurines for 3D object recognition from image.Multiple files, over 5GB total
- One Million Songs :Audio features and metadata for a subset (10,000) of the one million popular songs dataset for recognition/classification.1.8GB
- Hate Speech Identification :A sampling of Twitter posts that have been judged based on whether they are offensive or contain hate speech, as a training set for text analysis.2.66MB
- Hidden Beauty of Flickr Pictures :15,000 Flickr photo IDs that have received ratings based on aesthetics, for image analysis.138KB, use Flickr API to get images
Recognition
| Human Activity Recognition with Smartphones | Sensor data for recognizing the human activity - walking, sitting, etc. | 25MB | https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones |
Driving Data:驾驶数据
Domain:领域数据
Sports:体育
- Football Strategy :Thousands of scenarios to make the best coaching decisions. 共876KB
- Horses for Courses :Horse-racing data for predicting race results. 共 19MB
- NBA & MLB Stats :Current and past season stats for teams and players for fantasy sports predictions.
Medicines:医药
- National Survey on Drug Use and Health :Predict drug use based on health survey questions. 共2GB
- Prostate Cancer :Tumor and nontumor samples, used to recognize prostate cancer. 共 4.8MB
- Record of Heart Sound :Recordings of normal and abnormal heartbeats, used to recognize heart murmur, etc. 共47.7MB
Alien:外星人
- UFO Reports:80,000 historic reports for classification or regression. This dataset has been standardized from the source data at nuforc.org 共14.6MB。
Foods:饮食
- Wine Quality :Chemical properties of red and white wines (separately) and quality, for classification. 3个文件,共343KB。
Finance:金融
Others:其他
Competition:机器学习相关竞赛
- 阿里天池 新人实战赛
- Kaggle :官方新人赛,不错的入门学习
- Kaggle Tutorial :基于旅馆推荐比赛实例的完整Tutorial
- Driven Data
- Innocentive
- Crowdanalytix
- Tunedit
- DataFountain :DF,CCF指定中国专业的数据竞赛平台
Career:职业
来自:https://github.com/wxyyxc1992/DataScience-And-MachineLearning-Handbook-For-Coders/blob/master/DataScience-Reference.md
扫码二维码 获取免费视频学习资料
- 本文固定链接: http://phpxs.com/post/5486/
- 转载请注明:转载必须在正文中标注并保留原文链接
- 扫码: 扫上方二维码获取免费视频资料