| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

SP11OPIM410672

Page history last edited by shawndra@... 13 years ago

DSS/Data Mining and Machine Learning for Business Intelligence

 

OPIM  410/672: Spring 2011

 

 

 

Classtimes: (410/672.002) MW 10:30 - 12, (672.001) MW  1:30 - 3

 

First/Last Class: Jan 12 - April 25

 

Classroom: JMHH G86, G88

 

Instructor: Shawndra Hill

               Office Hours: Friday 2-5pm (sign up for time slot on webcafe) , or by appointment.

               Email: shawndra@wharton.upenn.edu (subject: [DSS class] … <- note!)

               Telephone: email me

 

TA: Santiago Gallino (sgallino@wharton.upenn.edu)

 

Prerequisites: None

 

Text: Data Mining Techniques, Second Edition by Michael Berry and Gordon Linoff Wiley, 2004 ISBN: 0-471-47064-3.  See slides for links to resources as well.

 

Remaining Project Due Dates

 

 

Group Formation (Email Your Group Members to Me) 0 1/28/2010
2-page proposal/Must have data!Consulting Session 5 2/13/2011 at 5pm Schedule consulting appt!
Mid-semester presentation/Proposal refinement 10 2/28/2011 and 3/2/2011*(send presentation by 2/27/2010 10pm EST,  presentation order will be determined by a random selection process)
Feedback to colleagues 5 3/2/2011*
Full presentation 20 4/18/2011 – 4/25/2010 (send presentation by 4/17/2011 10pm EST)
DM Report 50 5/4/2011 (Note: This is the major deliverable.  You may hand it in earlier if you like)
Reviews/Contribution Report 10 Last Day of Finals  (5/10/2011)

 

 

 

 

Supporting Documents

 

 

 

 

Outside Resources

 

 

Student Comments/Posts:

 

Human Subjects Disclosure: The completion of some of the assignments in this course may result in data of value for research on data mining/machine learning.  If the data generated in the class are used in research, no information will be revealed about the identities of individuals or about the specific intellectual content of student work. 

 


News and Announcements 

 


Session Outline

 

 

Week

Lecture 1

Lecture 2

Monday

Wednesday

1

 

Jan12:

 

Introduction to the Course

 

Required Reading:

Chapter 1 and 2

 

Jan. 12

2

MLK

Jan 19:

 

Introduction to Data Mining/Classification/EDA

 

Required Reading:

Chapter 1 and 2

 

Reference Reading: 

How Verizon Cut Customer Churn, Das M., Financial Express, 10-2003

Mining Business Databases, Brachman R.J, Khabaza, T., Klosgen W., Piatetsky-Shapiro, G. and Simoudis, E. Communications of the ACM, 1996, 39:11, pp.42-48

12 IT Skills That Employers Can’t Say No To, Brandel, M., Computerworld, 7

11-2007

NO CLASS

Jan. 19

3

Jan 24:

 

Classification: Recursive partitioning and Decision Trees

 

Required Reading:

Chapters 3,6 (pp 165 - 194)

 

Reference Reading:

Recursive Portfolio Selection with Decision Trees, Anton Andriyashin, Wolfgang Härdle, Roman Timofeev 

Our Technology And Data, Farecast article

How To Buy Data Mining: A Framework For Avoiding Costly Project Pitfalls In Predictive Analytics, Eric A. King, E.A., DM Review, October 2005

An Insurance Policy For Low Airfares, Tedeschi, B., NY Times, January 22, 2007

 

 

Jan 26:

 

Classification: Recursive partitioning and Decision Trees

 

Required Reading:

Chapter 6 (pp 165 - 194)

Recursive Portfolio Selection with Decision Trees, Anton Andriyashin, Wolfgang Härdle, Roman Timofeev 

 

 

Reference Reading:

 

Joined-up thinking, The Economist, Apr 4th 2007

Taking Retailers' Cues Harrah's Taps Into the Science of Gambling, WSJ, 11-22-2004

Jan. 25

 

HW1 Due (Jan 24)

Jan. 27

4

Jan 31:

 

Classification Model Evaluation

 

Required Reading:

Chapter 4 (pp 95-108)

 

Reference Reading:

Crafting Papers on Machine Learning, P. Langley

The Case Against Accuracy Estimation for Comparing Classifiers, Provost, F., T. Fawcett, and R. Kohavi, In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98).

 

 

 

 

Feb 2:

 

Cost Sensitive Learning

 

 

Required Reading:

Chapter 4 (pp 95-108)

The Relationship Between Default Prediction And Lending Profits:Integrating ROCAnalysis And Loan Pricing, Stein, R., Journal of Banking & Finance,29 (2005) 1213-1236  

 

 

 

HW2 Due (Jan 30)

 

5

Feb 7:

 

Naïve Bayes

 

Required Reading:

Chapter 8: pp.257-271

 

Reference Reading:

Learning and Evaluating Classifers under Sample Selection Bias, Zadrozny, B.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Domingos P. and Pazzani, M. , Machine Learning, 29, 103-130, 1997

What You Need To Know About Bayesian Spam Filtering ,Tschabitscher, H. 

A Plan For Spam,Zdziarski, J.

The State Of Spam, A monthly Report, Generated by Symantec Messaging and Web Security, February 2007

Spam And The Ongoing Battle For The Inbox Goodman, J., G.V. Cormack, and D. Heckerman, Communications of the ACM, February 2007,   Vol.50,No. 2, pp. 25-33

 

Feb 9:  LAB in JMHH Room 380

 

 

 

Project Proposal Due

(Feb 6)

HW3 Due (Feb 6)

 

 

 

 

 

6

Feb 14:

 

Association Rules/k-nearest Neighbor/Clustering

 

Required Reading

Chapter 9: Pages 287-315

Chapter 8: pp 257 - 271  

Chapter 11: 349-365 

 

 

Reference Reading:

 

TBA

 

 

Feb  16:  LAB JMHH Room 380

 

 

Reference Reading:

Weka Tutorial

An Intelligent Assistant for the Knowledge Discovery Process: An Ontology-based Approach, Bernstein, A., Provost, F., Hill, S. IEEE Transactions on Knowledge and Data Engineering 17(4), pp. 503-518, 2005. (PDF)

 

 

 

 

 

 

 

7

Feb 21

Genetic Algorithms

 

Required Reading

Chapter 13  (or become familiar with GAs using online resources) 

Reference Reading:

Discovering Interesting Patterns For Investment Decision Making With GLOWER – A Genetic Learner Overlaid With Entropy Reduction, Dhar, V., D. Chou, and F. Provost, DataMining and Knowledge Discovery, Vol. 4, No. 4/October, 2000 

 

 

 

Feb 23

Neural Networks

 

Required Reading

Chapter 7 (or become familiar with Neural Networks using onine resources)

 

 

 

HW4 Due (Feb 20)

 

8

Feb 28:

 

Group Presentations (Progress Report)

Mar 2:

 

Group Presentations (Progress Report)

 

 

 

Spring Break

Spring Break

 

 

9

Mar 14:

Data Mining Competitions (Work on competition in class) 

Mar 16:

Relational Learning 

(come with an open mind) 

Recommended Reading

Social Graph-iti, Oct 18th 2007

On Facebook, Scholars Link Up With Data, Stephanie Rosenbloom, December 17, 2007

Friend Accepted, The Economist, Oct 25th 2007

Six Degrees of Messaging, NatureNews, Katharine Sanderson, March 13, 2008

HW5Due (Mar 16)

 

10

Mar 21:

Text Mining

Mar 23:

Recommendation Systems/Collaborative Filtering

 

Required Reading

Amazon.com Recommendations: Item-to-Item Collaborative Filtering, Linden, G., B.Smith, & J. York, IEEE Computer Society, IEEE Internet Computing, Jan./Feb. 2003, pp. 76-80

Speaking out: Amazon.com's Jeff Bezos, The McGraw-Hill Companies, BusinessWeek Online, August 25, 2003

Netflix Prize Still Awaits a Movie Seer, Katie Hafner, NY Times, June 4,2007

You Want Innovation? Offer A Prize,Leonhardt, D., NY Times, Economix section, January 31, 2007

MySpace to Discuss Effort to Customize Ads, Brad Stone, NY Times, September 18, 2007 

Assignments:

OUT: Assignment 5 

 

Suggested Reading

 The Economist - A different game - 02-27-2010
The Economist - All too much - 02-27-2010
The Economist - Clicking for gold - 02-27-2010
The Economist - Data, data everywhere - 02-27-2010
The Economist - Leaders The data deluge - 02-27-2010
The Economist - Needle in a haystack - 02-27-2010
The Economist - New rules for big data - 02-27-2010
The Economist - The open society - 02-27-2010

   

11

Mar 28:

 Guest Speaker:  Cong Yu, Google

Mar 30:

Large Scale Mining/Cloud Computing

 

Recommended:

 

http://computer.howstuffworks.com/cloud-computing.htm 

 

http://www.zdnet.com/blog/hinchcliffe/eight-ways-that-cloud-computing-will-change-business/488

 

Check out these hadoop tutorials!

 

Lecture 1 in a 5 part Series: http://www.youtube.com/watch?v=yjPBkvYh-ss&feature=relmfu

 

 

 

 

 

12

Apr 4:

Guest Speaker:

Chris Volinsky, AT&T Labs Research 

Apr 6:

Guest Speaker:

Greg Levitt 33 Across

 

 

13

April 11:

Guest Speaker:

Nick Lim, Sonamine 

April 13

Guest Speaker:

Sheldon Gilbert, Proclivity Systems

 

 

14

Group Presentations 1

Group Presentations 2

 

 

15

Class Optional -- Use for extra office hours

 

 

 

 



Comments (0)

You don't have permission to comment on this page.