| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

SP10OPIM410672

Page history last edited by shawndra@... 13 years, 3 months ago

DSS/Data Mining and Machine Learning for Business Intelligence

 

OPIM  410/672: Spring 2011

 

 

 

Classtimes: (410/672.002) MW 10:30 - 12, (672.001) MW  1:30 - 3

 

First/Last Class: Jan 12 - April 25

 

Classroom: JMHH G86, G88

 

Instructor: Shawndra Hill

               Office Hours: Friday 2-5pm (sign up for time slot on webcafe) , or by appointment.

               Email: shawndra@wharton.upenn.edu (subject: [DSS class] … <- note!)

               Telephone: email me

 

TA: Santiago Gallino (sgallino@wharton.upenn.edu)

 

Prerequisites: None

 

Text: Data Mining Techniques, Second Edition by Michael Berry and Gordon Linoff Wiley, 2004 ISBN: 0-471-47064-3.  See slides for links to resources as well.

 

Remaining Project Due Dates

 

Mid-semester presentation (15 minutes ppt) -- Sunday February 27th by email (presentations the following week)

Final-semester presentation (15 minutes ppt) -- Sunday April 17th by email (presentations the following week)

Final Paper -- May 5th

Reviews -- May 11th -- last day of final exams

 

Supporting Documents

 

 

 

 

Outside Resources

 

 

Student Comments/Posts:

 

Human Subjects Disclosure: The completion of some of the assignments in this course may result in data of value for research on data mining/machine learning.  If the data generated in the class are used in research, no information will be revealed about the identities of individuals or about the specific intellectual content of student work. 

 


News and Announcements 

 


Session Outline

 

 

Week

Lecture 1

Lecture 2

Monday

Wednesday

1

 

Introduction to the Course

 

Required Reading:

Chapter 1 and 2

 

Jan. 12

2

MLK

Introduction to Data Mining/Classification/EDA

 

Required Reading:

Chapter 1 and 2

 

Reference Reading: 

How Verizon Cut Customer Churn, Das M., Financial Express, 10-2003

Mining Business Databases, Brachman R.J, Khabaza, T., Klosgen W., Piatetsky-Shapiro, G. and Simoudis, E. Communications of the ACM, 1996, 39:11, pp.42-48

12 IT Skills That Employers Can’t Say No To, Brandel, M., Computerworld, 7

11-2007

NO CLASS

Jan. 19

3

Classification: Recursive partitioning and Decision Trees

 

Required Reading:

Chapters 3,6 (pp 165 - 194)

 

Reference Reading:

Recursive Portfolio Selection with Decision Trees, Anton Andriyashin, Wolfgang Härdle, Roman Timofeev 

Our Technology And Data, Farecast article

How To Buy Data Mining: A Framework For Avoiding Costly Project Pitfalls In Predictive Analytics, Eric A. King, E.A., DM Review, October 2005

An Insurance Policy For Low Airfares, Tedeschi, B., NY Times, January 22, 2007

 

 

Classification: Recursive partitioning and Decision Trees

 

Required Reading:

Chapter 6 (pp 165 - 194)

Recursive Portfolio Selection with Decision Trees, Anton Andriyashin, Wolfgang Härdle, Roman Timofeev 

 

 

Reference Reading:

 

Joined-up thinking, The Economist, Apr 4th 2007

Taking Retailers' Cues Harrah's Taps Into the Science of Gambling, WSJ, 11-22-2004

Jan. 25

 

HW1 Due (Jan 24)

Jan. 27

4

Classification Model Evaluation

 

Required Reading:

Chapter 4 (pp 95-108)

 

Reference Reading:

Crafting Papers on Machine Learning, P. Langley

The Case Against Accuracy Estimation for Comparing Classifiers, Provost, F., T. Fawcett, and R. Kohavi, In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98).

 

 

 

 

Cost Sensitive Learning

 

 

Required Reading:

Chapter 4 (pp 95-108)

The Relationship Between Default Prediction And Lending Profits:Integrating ROCAnalysis And Loan Pricing, Stein, R., Journal of Banking & Finance,29 (2005) 1213-1236  

 

Feb. 1

 

HW2 Due

Feb. 3

5

Naïve Bayes

 

Required Reading:

Chapter 8: pp.257-271

 

Reference Reading:

Learning and Evaluating Classifers under Sample Selection Bias, Zadrozny, B.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Domingos P. and Pazzani, M. , Machine Learning, 29, 103-130, 1997

What You Need To Know About Bayesian Spam Filtering ,Tschabitscher, H. 

A Plan For Spam,Zdziarski, J.

The State Of Spam, A monthly Report, Generated by Symantec Messaging and Web Security, February 2007

Spam And The Ongoing Battle For The Inbox Goodman, J., G.V. Cormack, and D. Heckerman, Communications of the ACM, February 2007,   Vol.50,No. 2, pp. 25-33

 

 

Feb. 8

 

 

Project Proposal Due

Feb. 10

 

HW3 Due

 

NO CLASS/SNOW DAY

6

Association Rules/k-nearest Neighbor/Clustering

 

Required Reading

Chapter 9: Pages 287-315

Chapter 8: pp 257 - 271  

Chapter 11: 349-365 

 

 

Reference Reading:

 

TBA

 

 

Practical Weka Demo (Features you may want to use)

 

Reference Reading:

Weka Tutorial

An Intelligent Assistant for the Knowledge Discovery Process: An Ontology-based Approach, Bernstein, A., Provost, F., Hill, S. IEEE Transactions on Knowledge and Data Engineering 17(4), pp. 503-518, 2005. (PDF)

 

Feb. 15

 

(proposal)

 

 

Feb. 17

 

MEETING IN LAB

JMHH 380

HW4 Due (Remember it was postponed due to Snow)

7

Web Mining

 

Required Reading:

 

Content-Search deals Make twitter Profitable: Spencer E. Ante, BusinessWeek, December 21, 2009

How Many Facebook Users Will Go Public?: Douglas MacMillan, BusinessWeek, June 28, 2009

Mining the Web for Feelings, Not Facts: Alex Wright, NY Times, August 24, 2009

Put Ad on Web. Count Clicks. Revise. : Stephanie Clifford, NY Times, May 31, 2009

 

 

Related Technologies (OLAP, RDB, etc)

Business Intelligence & Cloud Computing

 

Recommended Reading:

 

The Two Flavors of Google Business Week, December 13, 2007

 

The End of Theory:  The Data Diluge Makes the Scientific Method Obselete, Chris Anderson, Wired 

Feb. 22

Feb. 24

8

Group Presentations (Progress Report)

Group Presentations (Progress Report)

Mar. 1

Mar. 3

 

Spring Break

Spring Break

Mar. 8

Mar. 10

9

Genetic Algorithms

 

Required Reading

Chapter 13

 

Reference Reading:

Discovering Interesting Patterns For Investment Decision Making With GLOWER – A Genetic Learner Overlaid With Entropy Reduction, Dhar, V., D. Chou, and F. Provost, DataMining and Knowledge Discovery, Vol. 4, No. 4/October, 2000 

 

 

Assignments:

Out:

 

Neural Networks

 

Required Reading:

Chapter 7

 

Reference Reading:

The Ultimate Money Machine, Kelley J. Bloomberg Markets, June 2007 (A must read:)) 

 

 

 

Mar. 15

Mar. 17

10

Work on DM Project in Class

Recommendation Systems/Collaborative Filtering

 

Required Reading

Amazon.com Recommendations: Item-to-Item Collaborative Filtering, Linden, G., B.Smith, & J. York, IEEE Computer Society, IEEE Internet Computing, Jan./Feb. 2003, pp. 76-80

Speaking out: Amazon.com's Jeff Bezos, The McGraw-Hill Companies, BusinessWeek Online, August 25, 2003

Netflix Prize Still Awaits a Movie Seer, Katie Hafner, NY Times, June 4,2007

You Want Innovation? Offer A Prize,Leonhardt, D., NY Times, Economix section, January 31, 2007

MySpace to Discuss Effort to Customize Ads, Brad Stone, NY Times, September 18, 2007 

Assignments:

OUT: Assignment 5 

 

Suggested Reading

 The Economist - A different game - 02-27-2010
The Economist - All too much - 02-27-2010
The Economist - Clicking for gold - 02-27-2010
The Economist - Data, data everywhere - 02-27-2010
The Economist - Leaders The data deluge - 02-27-2010
The Economist - Needle in a haystack - 02-27-2010
The Economist - New rules for big data - 02-27-2010
The Economist - The open society - 02-27-2010

 

Mar. 22

Mar. 24

11

Guest Speaker: AT&T/Netflix $1 million prize winner

Relational Learning

 

(come with an open mind) 

Recommended Reading

Social Graph-iti, Oct 18th 2007

On Facebook, Scholars Link Up With Data, Stephanie Rosenbloom, December 17, 2007

Friend Accepted, The Economist, Oct 25th 2007

Six Degrees of Messaging, NatureNews, Katharine Sanderson, March 13, 2008

 

Mar. 29

Mar. 31

12

Guest Speaker:  Data Mining in Finance V Dhar

Guest Speaker: 33 Across

Apr. 5

Apr. 7

13

Guest Speaker: SAS/JPM

Guest Speaker: Yahoo!

Apr. 12

Apr. 14

14

Group Presentations 1

Group Presentations 2

Apr. 19

Apr. 21

15

Class Optional -- Use for extra office hours

 

Apr. 26

 

Comments (0)

You don't have permission to comment on this page.