If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

SP08OPIM410672

Page history last edited by PBworks 16 years ago

 
OPIM410/672: Decision Support Systems

MW 1:30-3:00P, JMHH F65

Instructor: Shawndra Hill, JMHH 567, office hours Mondays 3:30-4:30, Fridays 2-5, or by appointment.

Teaching Assistant: Shachi Pandey [shachi.pandey@gmail.com]

Text: Data Mining Techniques, Second Edition by Michael Berry and Gordon Linoff Wiley, 2004 ISBN: 0-471-47064-3

Webcafe Page (You will need a Wharton account. If you don't have one, go to http://accounts.wharton.upenn.edu.)

Syllabus

Weka - You'll need to download this software to complete course assignments.

Human Subjects Disclosure: The completion of some of the assignments in this course may result in data of value for research on data mining/machine learning. If the data generated in the class are used in research, no information will be revealed about the identities of individuals or about the specific intellectual content of student work.

Outside Resources

data mining textbooks

datasets

News and Announcements

Optional: Sign up for Lunch with Shawndra Hill at POD

Mondays March 17, April 7,14 and Wednesdays April 9, 16

Departing Huntsman Hall Walnut Street Lobby at 11:40am sharp (Those who have class are welcome to join at 12)

Session Outline

Wed January 16

S1: Introduction to the Course

Required Reading:

How Verizon Cut Customer Churn, Das M., Financial Express, 10-2003

Reference Reading: (not required before class)

Chapters 1-2

Mining Business Databases, Brachman R.J, Khabaza, T., Klosgen W., Piatetsky-Shapiro, G. and Simoudis, E. Communications of the ACM, 1996, 39:11, pp.42-48

12 IT Skills That Employers Can’t Say No To, Brandel, M., Computerworld, 7-11-2007

Assignments:

Out:

Homework Assignment 1

Personal DM profile (Save to your computer rename with your last name first and upload to the profiles directory)

Wed January 23

S2: Introduction to Data Mining

Required Reading:

Chapters 1-2

Reference Reading:

A Golden Vein, The Economist, 1-04

Network-based Marketing: Identifying Likely Adopters via Consumer Networks, Hill, S., Provost, F., Volinsky, C., Statistical Science, 2006, 22:11, pp. 256-276

Assignments:

In:

Homework Assignment 1

Personal DM profile

Out:

Homework Assignment 2

Anonymous Feedback on Proposed Research Ideas

Mon January 28

S3: Introduction to Decision Trees

Required Reading:

Chapters 3,6 (pp 165 - 194)

Reference Reading:

Our Technology And Data, Farecast article

How To Buy Data Mining: A Framework For Avoiding Costly Project Pitfalls In Predictive Analytics, Eric A. King, E.A., DM Review, October 2005

An Insurance Policy For Low Airfares, Tedeschi, B., NY Times, January 22, 2007

Wed January 30

S4: Decision Trees Continued

Required Reading:

Chapter 6 (pp 165 - 194)

Reference Reading:

Joined-up thinking, The Economist, Apr 4th 2007

Taking Retailers' Cues Harrah's Taps Into the Science of Gambling, WSJ, 11-22-2004

Assignments:

In:

Homework Assignment 2

Anonymous Feedback on Proposed Research Ideas

Out:

Homework Assignment 3

Mon February 4

S5:  Evaluation in Machine Learning

Required Reading

Chapter 4 (pp 95-108)

Crafting Papers on Machine Learning, P. Langley

The Case Against Accuracy Estimation for Comparing Classifiers, Provost, F., T. Fawcett, and R. Kohavi, In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98).

Wed February 6

S5:  Cost Sensitive Learning

Required Reading

Chapter 4 (pp 95-108)

The Relationship Between Default Prediction And Lending Profits:Integrating ROCAnalysis And Loan Pricing, Stein, R., Journal of Banking & Finance,29 (2005) 1213-1236

Assignments:

In:

Homework Assignment 3 (Extended to Friday Feb 8)

Out:

2 Page Proposal for Group DM Project

Mon February 11

S7: Naive Bayes

Required Reading:

Chapter 8: pp.257-271

Reference Reading:

Learning and Evaluating Classifers under Sample Selection Bias, Zadrozny, B.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Domingos P. and Pazzani, M. , Machine Learning, 29, 103-130, 1997

What You Need To Know About Bayesian Spam Filtering ,Tschabitscher, H.

A Plan For Spam,Zdziarski, J.

The State Of Spam, A monthly Report, Generated by Symantec Messaging and Web Security, February 2007

Spam And The Ongoing Battle For The Inbox Goodman, J., G.V. Cormack, and D. Heckerman, Communications of the ACM, February 2007, Vol.50,No. 2, pp. 25-33

Wed February 13

S8: Association Rules, KNN, Clustering

Required Reading

Chapter 9: Pages 287-315

Chapter 8: pp 257 - 271

Chapter 11: 349-365

Assignments:

In:

2 Page Proposal for Group DM Project

Out:

Sign up for 20-30 minute consulting time to discuss your project

Mon February 18

S9: Weka Demo (MEETING IN LAB HH 380)

Reference Reading:

Weka Tutorial

An Intelligent Assistant for the Knowledge Discovery Process: An Ontology-based Approach, Bernstein, A., Provost, F., Hill, S. IEEE Transactions on Knowledge and Data Engineering 17(4), pp. 503-518, 2005. (PDF)

Assignments:

Out:

Homework Assignment 4

Wed February 20

S10: Genetic Algorithms

Required Reading

Chapter 13

Reference Reading:

Discovering Interesting Patterns For Investment Decision Making With GLOWER – A Genetic Learner Overlaid With Entropy Reduction, Dhar, V., D. Chou, and F. Provost, DataMining and Knowledge Discovery, Vol. 4, No. 4/October, 2000

Assignments:

Out:

First of two DM competition datasets (Friday Feb 22)

Mon February 25

S11: Neural Networks

Required Reading:

Chapter 7

Reference Reading:

The Ultimate Money Machine, Kelley J. Bloomberg Markets, June 2007 (A must read:))

Assignments:

In:

Homework Assignment 4

Out:

Wed February 27

S12: Data Mining for Business Inteligence

Required Reading

Reference Reading:

Assignments:

In: Group Presentations in PPT form (by Saturday March 1 10am)

Mon March 3

S13: Group Presentations

Wed March 5

S14: Group Presentations

Assignments:

Out: Have a Great Spring Break!

Mon March 17

S15: Weka Lab (MEETING IN LAB HH 380)

Reference Reading:

Weka Tutorial

Assignments:

Out:

Homework Assignment 5

Wed March 19

S16: Relational Learning P1

(come with an open mind)

Recommended Reading

Social Graph-iti, Oct 18th 2007

On Facebook, Scholars Link Up With Data, Stephanie Rosenbloom, December 17, 2007

Friend Accepted, The Economist, Oct 25th 2007

Six Degrees of Messaging, NatureNews, Katharine Sanderson, March 13, 2008

Mon March 24

S17: Relational Learning P2

Required Reading

The New Focus Groups: Online Networks, Emily Steel, WSJ, January 14, 2008

Fun List of Social Networking Sites from Mashable

Data Mining: Staking A Claim On Your Privacy, Cavoukian, A., Ph.D., Commissioner, Information and Privacy Commissioner/Ontario, January 1998, pp. i, ii, iii, 1-22

Fair Information Practice Principles

Online Ads vs. Privacy, Dan Mitchell, NY Times, May 12,2007

Big Brother Just wants to help, The Economist, March 8, 2007

Recommended Reading

Privacy Preserving Data Mining, Rakesh Agrawal, et. al. ACM SIGMOD International Conference of Management of Data (SIGMOD), 2000.

k-Anonymity: a model for protecting privacy, Latanya Sweeney, International Journal on Uncertainty, Fuzziness and Knowledge-based System, 2002.

Mondrian Multidimensional K-Anonymity, Kristen LeFevre, et. al. IEEE International Conference on Data Engineering, 2006.

Wed March 26

S18: Harrah's Case

Required Reading

Revisit Chapters 1-2

Assignments:

In: Homework Assignment 5

Monday March 31

S19: Recommendation Systems/Collaborative Filtering

Recommended Reading

Amazon.com Recommendations: Item-to-Item Collaborative Filtering, Linden, G., B.Smith, & J. York, IEEE Computer Society, IEEE Internet Computing, Jan./Feb. 2003, pp. 76-80

Speaking out: Amazon.com's Jeff Bezos, The McGraw-Hill Companies, BusinessWeek Online, August 25, 2003

Netflix Prize Still Awaits a Movie Seer, Katie Hafner, NY Times, June 4,2007

You Want Innovation? Offer A Prize,Leonhardt, D., NY Times, Economix section, January 31, 2007

MySpace to Discuss Effort to Customize Ads, Brad Stone, NY Times, September 18, 2007

Wed April 2

S20: Weka Lab for DM Competition (MEETING IN LAB HH 380)

Monday April 7

S21: Guest Speaker: Claudia Perlich, IBM Research

Claudia Perlich has received her M.Sc. in Computer Science from Colorado University at Boulder, Diplom in Computer Science from Technische Universitaet in Darmstadt, and her Ph.D. in Information Systems from Stern School of Business, New York University. Her Ph.D. thesis concentrated on probability estimation in multi-relational domains that capture information of multiple entity types and relationships between them. Her dissertation was recognized as an additional winner of the International SAP Doctoral Support Award Competition and her submission placed second in the yearly data mining competition in 2003 (KDD-Cup 03).

Claudia joined the Data Analytics Research group as a Research Staff Member in October 2004. She interned during summer 1999 at Deep Computing for Commerce Research Group under Murray Campbell working on financial trading behavior on Treasury Bonds. Her research interests are in machine learning for complex real-world domains and the comparative study of model performance as a function of domain characteristics.

Required Reading

Making the Most of Your Data: KDD Cup 2007 “How Many Ratings” Winner’s Report, S. Rosset, C. Perlich, Y. Liu

Wednesday April 9

S22: Guest Speaker: Robert Bell, AT&T Labs Research

Robert Bell has been a member of the Statistics Research Department at AT&T Labs-Research since 1998. He previously worked at RAND doing public policy analysis. His current research interests include machine learning methods, analysis of data from complex samples, and record linkage methods. He has served on several National Research Council panels advising the Census Bureau and chairs a current panel on coverage measurement for the 2010 census. He is currently a member of the board of the National Institute of Statistical Sciences and was recently a member of the Committee on National Statistics and chair of the Fellows

Committee of the American Statistical Association.

Required Reading

http://www.wired.com/techbiz/media/magazine/16-03/mf_netflix

http://stat-computing.org/newsletter/v182.pdf (pp. 4-12)

Recommended Reading

http://www.research.att.com/~volinsky/netflix/

Monday April 14

S23: (MEETING IN LAB HH 380/OPTIONAL!)

Wednesday April 16

S24: Guest Speaker: Daryl Pregibon, Google, Inc.

Daryl Pregibon is the research scientist at Google, Inc. He is a recognized leader in data mining, the interdisciplinary field that combines statistics, artificial intelligence, and data base research. His research interests include analysis of massive data sets, statistical computing, generalized linear models, tree-based methods, and regression diagnostics. During his career, Dr. Pregibon has nurtured successful interactions in fiber and microelectronics manufacturing, network reliability, customer satisfaction, fraud detection, targeted marketing, and regulatory statistics. Over these years, his research contributions changed from mathematical statistics to computational statistics and included such topics as expert systems for data analysis, data visualization, application-specific data structures for statistics, and large-scale data analysis. From 1989-2004, he worked at AT&T and served as head, statistics research. He is currently a member of the NAS Committee on National Statistics; the NAS Study Committee on Ballistics and former chair of the NAS Committee on Applied & Theoretical Statistics. He has also held positions on the National Advisory Committee for the Statistical and Applied Mathematical Sciences Institute (SAMSI), Research Triangle Park and is director of the Association for Computer Machinery (ACM) Special Interest Group on Knowledge Development and Data Mining (SIGKDD). Other previous academic and professional experiences include: associate editor of Data Mining & Knowledge Discovery; associate editor, Statistics & Computing; and co-founder of the Society for Artificial Intelligence & Statistics (SAIAS). He has authored more than 60 publications and holds four patents. Dr. Pregibon received his Ph.D. in statistics from the University of Toronto and his M.A. in mathematics from Youngstown State University (source: The National Academies).

Mon April 21

S25: Guest Speaker: Steven L. Scott, Capital One

Steven L. Scott received his PhD from the Harvard statistics department in 1998. From 1998 to 2007 he served on the faculty of the Marshall School of Business at the University of Southern California. Dr. Scott's research focuses on applied Bayesian computation in a diverse set of fields including web traffic modeling, e-commerce, network security, health policy research, and educational testing. Several of his papers have appeared in the Journal of the American Statistical Association, the premier journal in the field of statistics. He has had consulting relationships with several companies ranging from AT&T-Bell Labs, to a psychic hotline, to the McKinsey Corporation. In June of

2007 Dr. Scott left USC to join Capital One, where he now serves as a Director of Statistical Analysis.

Recommended Reading

Competing on Analytics: The New Science of Winning, T. Davenport, J. Harris

Wed April 23

S26: Group Presentations

Monday April 28

S27: Group Presentations/Data Mining Competition Winner Announced

Comments (0)

You don't have permission to comment on this page.

To join this workspace, request access.

Already have an account? Log in!

Loading…

Sidebar

Wharton

Loading…

SP08OPIM410672

OPIM410/672: Decision Support Systems

Outside Resources

News and Announcements

Session Outline

k-Anonymity: a model for protecting privacy, Latanya Sweeney, International Journal on Uncertainty, Fuzziness and Knowledge-based System, 2002.

SP08OPIM410672

Page Tools

Insert links

Comments (0)

Join this workspace

Navigator

SideBar

Recent Activity