| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

SP08OPIM410672

Page history last edited by PBworks 16 years ago

 
OPIM410/672: Decision Support Systems

MW 1:30-3:00P, JMHH F65

 

 

Instructor: Shawndra Hill, JMHH 567, office hours Mondays 3:30-4:30, Fridays 2-5, or by appointment.

Teaching Assistant: Shachi Pandey [shachi.pandey@gmail.com]

 

Text: Data Mining Techniques, Second Edition by Michael Berry and Gordon Linoff Wiley, 2004 ISBN: 0-471-47064-3

 

Webcafe Page (You will need a Wharton account. If you don't have one, go to http://accounts.wharton.upenn.edu.)

 

Syllabus

 

Weka - You'll need to download this software to complete course assignments.

 

Human Subjects Disclosure: The completion of some of the assignments in this course may result in data of value for research on data mining/machine learning.  If the data generated in the class are used in research, no information will be revealed about the identities of individuals or about the specific intellectual content of student work.  

 

Outside Resources

data mining textbooks

datasets 

 

News and Announcements 


Optional:    Sign up for Lunch with Shawndra Hill at POD 

 

     Mondays March 17, April 7,14 and Wednesdays April 9, 16 

     Departing Huntsman Hall Walnut Street Lobby at 11:40am sharp (Those who have class are welcome to join at 12)

 

 

Session Outline


Wed January 16

 

S1: Introduction to the Course 

 

Required Reading:

How Verizon Cut Customer Churn, Das M., Financial Express, 10-2003

 

Reference Reading: (not required before class)

Chapters 1-2

Mining Business Databases, Brachman R.J, Khabaza, T., Klosgen W., Piatetsky-Shapiro, G. and Simoudis, E. Communications of the ACM, 1996, 39:11, pp.42-48

12 IT Skills That Employers Can’t Say No To, Brandel, M., Computerworld, 7-11-2007

 

 

Assignments:

Out:

Homework Assignment 1

Personal DM profile  (Save to your computer rename with your last name first and upload to the profiles directory) 


Wed January 23

 

S2: Introduction to Data Mining 

 

 

Required Reading:

Chapters 1-2 

 

Reference Reading:

A Golden Vein,  The Economist, 1-04

Network-based Marketing: Identifying Likely Adopters via Consumer Networks, Hill, S., Provost, F., Volinsky, C., Statistical Science, 2006, 22:11, pp. 256-276

 

Assignments:

In:

Homework Assignment 1

Personal DM profile

 

Out:

Homework Assignment 2

Anonymous Feedback on Proposed Research Ideas 

 


 Mon January 28

 

S3: Introduction to Decision Trees 

 

Required Reading:

Chapters 3,6 (pp 165 - 194)

 

Reference Reading:

Our Technology And Data, Farecast article

How To Buy Data Mining: A Framework For Avoiding Costly Project Pitfalls In Predictive Analytics, Eric A. King, E.A., DM Review, October 2005

An Insurance Policy For Low Airfares, Tedeschi, B., NY Times, January 22, 2007


Wed January 30 

 

S4: Decision Trees Continued 

 

Required Reading:

Chapter 6 (pp 165 - 194)

 

Reference Reading:

Joined-up thinking, The Economist, Apr 4th 2007

Taking Retailers' Cues Harrah's Taps Into the Science of Gambling, WSJ, 11-22-2004

 

Assignments:

In:

Homework Assignment 2

Anonymous Feedback on Proposed Research Ideas

 

Out:

Homework Assignment 3 

 

 


 

Mon February 4

 

S5:  Evaluation in Machine Learning 

 

 

Required Reading

Chapter 4 (pp 95-108)

Crafting Papers on Machine Learning, P. Langley

The Case Against Accuracy Estimation for Comparing Classifiers, Provost, F., T. Fawcett, and R. Kohavi, In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98).

 


Wed February 6

 

S5:  Cost Sensitive Learning

 

 

Required Reading

Chapter 4 (pp 95-108)

The Relationship Between Default Prediction And Lending Profits:Integrating ROCAnalysis And Loan Pricing, Stein, R., Journal of Banking & Finance,29 (2005) 1213-1236  

 

Assignments:

In:

Homework Assignment 3 (Extended to Friday Feb 8)

 

Out:

2 Page Proposal for Group DM Project


Mon February 11

 

S7:  Naive Bayes 

 

Required Reading:

Chapter 8: pp.257-271

 

Reference Reading:

Learning and Evaluating Classifers under Sample Selection BiasZadrozny, B.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Domingos P. and Pazzani, M. , Machine Learning, 29, 103-130, 1997

What You Need To Know About Bayesian Spam Filtering ,Tschabitscher, H. 

A Plan For Spam,Zdziarski, J.

The State Of Spam, A monthly Report, Generated by Symantec Messaging and Web Security, February 2007

Spam And The Ongoing Battle For The Inbox Goodman, J., G.V. Cormack, and D. Heckerman, Communications of the ACM, February 2007,   Vol.50,No. 2, pp. 25-33

 

 


Wed February 13 

 

S8: Association Rules, KNN, Clustering 

 

Required Reading

Chapter 9: Pages 287-315

Chapter 8: pp 257 - 271  

Chapter 11: 349-365 

 

Assignments:

In:

2 Page Proposal for Group DM Project  

 

Out:

Sign up for 20-30 minute consulting time to discuss your project

 

 


Mon February 18

 

S9:  Weka Demo (MEETING IN LAB HH 380)

 

Reference Reading:

Weka Tutorial

An Intelligent Assistant for the Knowledge Discovery Process: An Ontology-based Approach, Bernstein, A., Provost, F., Hill, S. IEEE Transactions on Knowledge and Data Engineering 17(4), pp. 503-518, 2005. (PDF)

 

Assignments:

Out:

Homework Assignment 4

 


Wed February 20 

 

S10: Genetic Algorithms 

 

Required Reading

Chapter 13

 

Reference Reading:

Discovering Interesting Patterns For Investment Decision Making With GLOWER – A Genetic Learner Overlaid With Entropy Reduction, Dhar, V., D. Chou, and F. Provost, DataMining and Knowledge Discovery, Vol. 4, No. 4/October, 2000 

 

Assignments:

 

Out:

First of two DM competition datasets (Friday Feb 22)

 


Mon February 25

 

S11:  Neural Networks 

 

Required Reading:

Chapter 7

 

Reference Reading:

The Ultimate Money Machine, Kelley J. Bloomberg Markets, June 2007 (A must read:))

 

Assignments:

In:

Homework Assignment 4 

 

 

Out:  


Wed February 27 

 

S12: Data Mining for Business Inteligence

 

 

Required Reading

 

 

 

Reference Reading:

 

Assignments:

In: Group Presentations in PPT form (by Saturday March 1 10am)


Mon March 3

 

S13:  Group Presentations

 

 


Wed March 5

 

S14: Group Presentations

 

 

Assignments:

 

Out: Have a Great Spring Break!

 



Mon March 17

 

S15:  Weka Lab (MEETING IN LAB HH 380)

 

Reference Reading:

Weka Tutorial

 

 

Assignments:

Out:

Homework Assignment 5


Wed March 19 

 

S16: Relational Learning P1

(come with an open mind) 

 

Recommended Reading

Social Graph-iti, Oct 18th 2007

On Facebook, Scholars Link Up With Data, Stephanie Rosenbloom, December 17, 2007

Friend Accepted, The Economist, Oct 25th 2007

Six Degrees of Messaging, NatureNews, Katharine Sanderson, March 13, 2008

 


Mon March 24 

 

S17: Relational Learning P2

 

 

Required Reading

The New Focus Groups: Online Networks, Emily Steel, WSJ, January 14, 2008 

Fun List of Social Networking Sites from Mashable

Data Mining: Staking A Claim On Your Privacy, Cavoukian, A., Ph.D., Commissioner, Information and Privacy Commissioner/Ontario, January 1998, pp. i, ii, iii, 1-22

Fair Information Practice Principles

Online Ads vs. Privacy, Dan Mitchell, NY Times, May 12,2007

Big Brother Just wants to help, The Economist, March 8, 2007 

 

 

 

Recommended Reading

Privacy Preserving Data Mining, Rakesh Agrawal, et. al. ACM SIGMOD International Conference of Management of Data (SIGMOD), 2000.

k-Anonymity: a model for protecting privacy, Latanya Sweeney,  International Journal on Uncertainty, Fuzziness and Knowledge-based System, 2002.

Mondrian Multidimensional K-Anonymity, Kristen LeFevre, et. al.  IEEE International Conference on Data Engineering, 2006.

 


Wed March 26 

 

S18: Harrah's Case

 

 

Required Reading

Revisit Chapters 1-2  

 

Assignments:

In: Homework Assignment 5


Monday March 31

 

S19: Recommendation Systems/Collaborative Filtering

 

 

Recommended Reading

Amazon.com Recommendations: Item-to-Item Collaborative Filtering, Linden, G., B.Smith, & J. York, IEEE Computer Society, IEEE Internet Computing, Jan./Feb. 2003, pp. 76-80

Speaking out: Amazon.com's Jeff Bezos, The McGraw-Hill Companies, BusinessWeek Online, August 25, 2003

 

Netflix Prize Still Awaits a Movie Seer, Katie Hafner, NY Times, June 4,2007

You Want Innovation? Offer A Prize,Leonhardt, D., NY Times, Economix section, January 31, 2007

MySpace to Discuss Effort to Customize Ads, Brad Stone, NY Times, September 18, 2007 

 


Wed April 2

 

S20: Weka Lab for DM Competition (MEETING IN LAB HH 380)

 


Monday April 7

 

S21: Guest Speaker: Claudia Perlich, IBM Research

Claudia Perlich has received her M.Sc. in Computer Science from Colorado University at Boulder, Diplom in Computer Science from Technische Universitaet in Darmstadt, and her Ph.D. in Information Systems from Stern School of Business, New York University. Her Ph.D. thesis concentrated on probability estimation in multi-relational domains that capture information of multiple entity types and relationships between them. Her dissertation was recognized as an additional winner of the International SAP Doctoral Support Award Competition and her submission placed second in the yearly data mining competition in 2003 (KDD-Cup 03).

 

Claudia joined the Data Analytics Research group as a Research Staff Member in October 2004. She interned during summer 1999 at Deep Computing for Commerce Research Group under Murray Campbell working on financial trading behavior on Treasury Bonds. Her research interests are in machine learning for complex real-world domains and the comparative study of model performance as a function of domain characteristics.

 

Required Reading

Making the Most of Your Data: KDD Cup 2007 “How Many Ratings” Winner’s Report, S. Rosset, C. Perlich,  Y. Liu

 


 

Wednesday April 9

 

S22: Guest Speaker: Robert Bell, AT&T Labs Research

Robert Bell has been a member of the Statistics Research Department at AT&T Labs-Research since 1998.  He previously worked at RAND doing public policy analysis.  His current research interests include machine learning methods, analysis of data from complex samples, and record linkage methods.  He has served on several National Research Council panels advising the Census Bureau and chairs a current panel on coverage measurement for the 2010 census.  He is currently a member of the board of the National Institute of Statistical Sciences and was recently a member of the Committee on National Statistics and chair of the Fellows

Committee of the American Statistical Association.

 

 

Required Reading

http://www.wired.com/techbiz/media/magazine/16-03/mf_netflix

http://stat-computing.org/newsletter/v182.pdf (pp. 4-12)

 

 

Recommended Reading

http://www.research.att.com/~volinsky/netflix/

 

 


 

Monday April 14

 

S23: (MEETING IN LAB HH 380/OPTIONAL!) 


 

Wednesday April 16

 

S24: Guest Speaker: Daryl Pregibon, Google, Inc.

Daryl Pregibon is the research scientist at Google, Inc. He is a recognized leader in data mining, the interdisciplinary field that combines statistics, artificial intelligence, and data base research. His research interests include analysis of massive data sets, statistical computing, generalized linear models, tree-based methods, and regression diagnostics. During his career, Dr. Pregibon has nurtured successful interactions in fiber and microelectronics manufacturing, network reliability, customer satisfaction, fraud detection, targeted marketing, and regulatory statistics. Over these years, his research contributions changed from mathematical statistics to computational statistics and included such topics as expert systems for data analysis, data visualization, application-specific data structures for statistics, and large-scale data analysis. From 1989-2004, he worked at AT&T and served as head, statistics research. He is currently a member of the NAS Committee on National Statistics; the NAS Study Committee on Ballistics and former chair of the NAS Committee on Applied & Theoretical Statistics. He has also held positions on the National Advisory Committee for the Statistical and Applied Mathematical Sciences Institute (SAMSI), Research Triangle Park and is director of the Association for Computer Machinery (ACM) Special Interest Group on Knowledge Development and Data Mining (SIGKDD). Other previous academic and professional experiences include: associate editor of Data Mining & Knowledge Discovery; associate editor, Statistics & Computing; and co-founder of the Society for Artificial Intelligence & Statistics (SAIAS). He has authored more than 60 publications and holds four patents. Dr. Pregibon received his Ph.D. in statistics from the University of Toronto and his M.A. in mathematics from Youngstown State University (source: The National Academies).

 

 


 

Mon April 21

 

S25:  Guest Speaker: Steven L. Scott, Capital One

 

Steven L. Scott received his PhD from the Harvard statistics department in 1998.  From 1998 to 2007 he served on the faculty of the Marshall School of Business at the University of Southern California.  Dr. Scott's research focuses on applied Bayesian computation in a diverse set of fields including web traffic modeling, e-commerce, network security, health policy research, and educational testing.  Several of his papers have appeared in the Journal of the American Statistical Association, the premier journal in the field of statistics.  He has had consulting relationships with several companies ranging from AT&T-Bell Labs, to a psychic hotline, to the McKinsey Corporation.  In June of

2007 Dr. Scott left USC to join Capital One, where he now serves as a Director of Statistical Analysis.

 

Recommended Reading

 

Competing on Analytics: The New Science of Winning,  T. Davenport, J. Harris

 

 


Wed April 23

 

S26: Group Presentations

 

 


Monday April 28

 

S27: Group Presentations/Data Mining Competition Winner Announced

 

 

 

 

Comments (0)

You don't have permission to comment on this page.