"What is not measured cannot be controlled." (paraphrase of  W. Edwards Deming) "Argument is only possible when you don't have the facts. Get the facts." (paraphrase of Peter F. Drucker) "Discovery consists of seeing what everybody has seen and thinking what nobody has thought." —Albert von Szent-Gyorgy

## Statistics!   A Cynical View

This sheet is for your convenience in deciding whether to take the second exam.  Exam 2 covers a few more types of statistical tests than exam 1, but the concepts are the same. If you are happy with the grade you earned on exam 1, you can skip exam 2 and I will just multiply the grade on exam 1 times 2 in calculating the grade. Many people at this point have solid good grades, and it would be silly to go through the busy work of taking and grading another exam.
Remember, too that grades may go down as well as up from results of the second exam.

### In the future, upgraded versions of these will be available on CD through my classes or from instructionalvideotutorials.com

Paper Syllabus from (online Course)
Horizon Wimba Live classroom support documents at http://www.wimba.com/support/liveclassroom/docs.php
Technical Support for liveclassroom at the UMSL help desk, (314) 516-6034
The participants manual for version 4.3 is available.  A course CD with all my videotutorials was distributed in the first introductory sessions. More videos with new versions of OFFICE will be made available at this website as the semester progresses.

# BA3300 Introduction to Business Statistics

#### COURSE DESCRIPTION & OBJECTIVES:

"This is a second course in a required two-course statistics sequence. The first course (Math 105/1105 or its equivalent), covers the basics of probability and statistics, and BA3300 builds on that material. It is assumed that the student is acquainted with the material in Math 1105, and basic use of a PC-Compatible computer, including navigating on the web, fundamental concepts of spreadsheets, and e-mail. Students will learn techniques used for relational analysis and business forecasting and how to apply them in a business context. Tools include Chi-Square tests of statistical independence; analysis of variance; simple linear regression and correlation; multiple linear regression; and extrapolative techniques such as moving averages and exponential smoothing. Emphasis is placed on problem definition, construction of statistical models, analysis of data, and interpretation of  results."

#### Prerequisites:

BA1800(103), Math 1105(105), familiarity with basic statistics and PC-compatible computers, spreadsheets, email, and use of the internet. Minimum 2.0 Campus GPA. These prerequisites are not waived.

#### Required:

• Recommended Text: Anderson, Sweeney, Williams-Statistics for Business and Economics (8th, 9th or 10th edition as used in Math 1105); handouts and web materials. Here is a link to download the data  and tools related to the textbook. http://websites.swlearning.com/cgi-wadsworth/course_products_wp.pl
• Save your work every day on your virtual drive, disks or flashdrive. The plan is to turn in assignments and exams using the mygateway dropbox.  Most of our calculations will be done in EXCEL, but a calculator may still be useful. You shouldn't need a complex statistical calculator.
• You will need an email account.
• A Notebook. Although a lot of the skills we will develop have to do with spreadsheets, My hope is to reduce emphasis on mechanics of the calculations so we can focus effectively on what the results mean rather than memorizing a bunch of formulas. A notebook will help as we work through the concepts.
• Note: The Forgetting Curve
• Ode to Procrastination
• You should also have a copy of Microsoft Office on your computer so you can do the EXCEL assignments at home. Try the bookstore for the Academic version Office Pro at a very reasonable price.

## HEY!! In Office 2007, where did the EXCEL menus go?

Sometimes, new versions of software seem like this video microsoftnew versionsintroductions

How to find EXCEL 2003 commands in EXCEL 2007 "Ribbons"
I recommend clicking on the OFFICE icon at the top left of EXCEL 2007, click EXCEL OPTIONS (a button down at the bottom) and fix whatever needs fixing to get access to what you need, such as "Developer tab." Change the SaveAs default to old (1997-2003) EXCEL so you can still work on the product files on older versions of EXCEL. In case you have to, you might be able to find file converters at microsoft.com. They are also promising to someday make their file formats open source. Uncommon fonts may also wreak havoc as you try to work on a file across machines. Change default font theme under "page layout." I know Times New Roman and Arial are uninspiring, but they are more likely to be on your other machine. After I get a feel for what needs to be fixed in the School's setup of EXCEL, I'll produce some guidance and videos on how to fix things, so you can relate my existing videotutorials to EXCEL 2007.
Microsoft has a compatibility pack for installation in older versions of OFFICE so you can read and write the new file formats. You need to install all the updates first. http://www.microsoft.com/downloads/details.aspx?FamilyId=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en
if you can't use the link, then you can probably find it at the microsoft office site or google it.

Fall, 2007 -- Online Session

Robert J. "Bud" Banis, Ph.D.,C.M.A.
CCB 230 314-516-6136; 636-394-4950
E-mail: ba3300 (at) bud banis .com   (omit spaces) use this email address for all communications on the course. don't copy and paste it, as it has extra spaces to thwart automated harvesting by the spammers

"The Rules" on things like attendance, drop dates, academic dishonesty, and the like.
Rules on cheating are enforced.
See the discussion at collegecheating.com

 Tentative GRADING: Online Section Two Exams 40% up to 12 quizzes 20% 6 exercises 30% Research report (borderline decisions) Due last class 12/5 hard copy at 2nd exam 10% Total   points 100%
To help make a decision about whether to take the second exam.

MC exams are closed book, one page (2-sides) notes, multiple choice and short answer.
Quizzes are open online in mygateway, but all exercises and exams are individual effort only.  Submissions very similar in form and /or content will result in academic affairs investigations.
Letter grade breaks are  90, 80, 70, 60%. Plus & minus grades are rare. Research Report quality will be used to decide on borderline cases.

# Resources:

Review spreadsheet formula design from BA 1800

## Resources and Statistics video tutorials

Videos and other course materials on course CDs and DVD-- This is different from the CD's that came with the book.
Excellent EXCEL Videotutorials from DataPigTechnologies

## Is EXCEL an adequate statistics package?

http://www.practicalstats.com/Pages/excelstats.html
http://www.cmis.csiro.au/Mary.Barnes/PDF/Statistical%20flaws%20in%20Excel_Woody.pdf
http://www.stat.uni-muenchen.de/~knuesel/elv/accuracy.html
http://www.stat.uni-muenchen.de/~knuesel/elv/excelxp.pdf

-----Some interesting statistics questions to research

## Global warming

This is a fascinating and hotly argued issue that provides some stimulating examples of poor statistical reasoning.
The base issue appears to be relationship between atmospheric CO2 concentrations and climate change. Correlation is not causality, and a look at the data allows strong questioning of the supposed "causal relationship" of CO2 to climate change.
CO2 science
The petition project and Environmental Effects of Increased Atmospheric Carbon Dioxide

Response to skeptics by advocates of AGHG models--this site purports to refute the skeptics, but much of the refutation appears to be name-calling and the discussions reinforce skepticism if you look at the data--see, for example, the discussion about correlation and causality for CO2 and temperature change:  "There is no proof that CO2 is causing global warming"
The fact of the matter is that you don't "prove" anything--either way--with statistics, especially if it's based on computer models and selected data.
Climatescience.org is a website by people who are making their living in the global warming industry. This is pretty much the official gospel by those who are paid to preach it.
Annotated UN IPCC paper on the "scientific basis" of the AGHG model mostly pointing out that the press reports on confidence levels were highly exaggerated and that the calamitous predictions are based almost solely not on data, but on computer models that have failed to predict even the past. See, for example, the data on storm frequency and severity in the 2007 paper on Environmental Effects of Increased Atmospheric Carbon Dioxide
The UN has a strong incentive to promote the AGHG fears as it's been proposed that the solution is to grant the UN power to tax industries and to regulate "carbon offset" schemes. This tax would help to redistribute wealth by "supporting UN development projects in poor countries."
[UN Committee] "Members agreed that reversing the widening and 'shameful' gap between rich and poor countries 'is the pre-eminent moral and humanitarian challenge of our age.'"

## CPI Calculations

Is the CPI an accurate estimate of inflation?

# Tutors for the Fall Session.

• Guodong Zhang and Vicki Chen are the BA 3300 tutors this  semester, with office hours in CCB213 during these times:

• Guodong:
email address is gzq35 @ umsl.edu

Monday 12-2pm and 4-5pm

Thursday 10:50-12:50pm and 4-5pm

Vicki:
Vickichen2008 @ hotmail.com

Tuesday 10:30-12:30pm and 3:30-4:30

Wednesday 11:30-1:30pm and 3:30-4:30

in process:
Weeks 1-4

## Overview, review of Z and t distributions, hypothesis testing. The Central Limit Theorem. One sample t-test.

Review of Netscape Navigator/MS Internet Explorer, Windows Explorer and EXCEL Editing, converting and cleaning up real data-Descriptive Statistics, Graphing, t-tests. Comparing two sample means.
Normsdist function:
Be the "Martha Stewart" of Statistics-- make your own Z score Table. The "Martha Stewart" table of critical t's (and a more complex version of t-tail probabilities

Exercise 1: Sampling and the central limit theorem.
Due Before Saturday Night, September 22
using the random sampling procedure, collect 20 samples of size n=25 from the airplane empty seat data.
Derive means, standard deviations and 95% confidence intervals for estimates of the population mean for each of the 20 samples.
Compare histograms of the original distribution and the distribution of sample means.
 If you are doing this in EXCEL 2007, you will find the data analysis procedures under the data tab -- data analysis. If you want to write a macro, you'll have to set up to show the developer tab by clicking on the "office icon on the upper left, then "excel options" button and check the box to show the developer tab. The button and other forms are found under the developer tab / insert / form controls. This used to be a simple toolbar! I'm developing a general impression that Microsoft has made life a lot more complicated with the new version. The out-of-class videotutorials page now has new videos with step-by-step instructions in EXCEL 2007. Look for the pink box.
Turn in the resulting spreadsheet to the mygateway dropbox set up under assignments, set up to be printed to be 1 page (legible) with your name and student number printed at the top of the sheet. You can accomplish this by printing a selected range that leaves out some of the original data.
video on print to 1 page
Submit a file via the Mygateway dropbox. be sure to click send to submit.
How many gave an erroneous estimate?
What is the expected frequency of errors in estimating the population mean when you use a 95% confidence interval? Was your observed result close to that expectation?
State the three characteristics we expect to see in the distribution of sample means from the Central Limit Theorem. Are your results reasonably consistent with those expectations?

## Hypothesis testing and sample means h0test.pdf

Caution in the interpretation of hypothesis test!

Comparison of sample variances through F tests, comparison of sample means, controlled experiments, Ch 9

Exercise 2:
Click here for Detailed Description with questions to guide you in interpretation
Paired ttest on gender and grades Due Before Friday night at midnite October 5
1)Print your name and student number (STUDENT NUMBER--up to 50 points will be taken off for not using the student number) at the top of the sheet. use the last 4 digits of your student number in the data in place of A,B,C,D.
2)Print interpretations in text boxes by the procedures they refer to.
See the completed example we did in class on the salaries data.
3)Set up the spreadsheet to be printed to be 1 page-legible. You can accomplish this by printing a selected range that leaves out some of the original data or set it up to print 1 page wide by 2 long with all the analysis on th4e first page.
4)Submit the file via the Mygateway dropbox.Title it Exc2 with your name.  Be sure to click "add" and then submit.

 There is a set of videos on exactly how to do all this in EXCEL 2007 on the videotutorials page, though you will need to add the text describing the results.

## Variance and Standard Deviation:

Ch 6 Describing data measures of central tendency dispers.xls  : dispersity Areas under the normal curve-- the Z score table . Applications to normallized scores--the IQ Score examples

## Central Limit Theorem

Sampling Statistics and confidence limits on estimates of parameters. Ch 7 Sampling from Airplane empty seats data airseats.htm
video samplek.avi  Results show Central Limit Theorem on distribution of sample means.
Joint probabilities and probabilities of multiple errors in the sampling experiment. Normal approximation to binomial distribution.
Evidence for carcinogenicity of Dioxin.

## Hypothesis testing and sample means

Example one-sample t-test: Is hospital length of stay < 5 days?
video on 1-sample t-test Beta error & sample size losbeta.xls

## T-test two sample means:

gender and height data from class
video on 2-sample t-test
height data 2 sample t test completed

## paired Sample T test:

Example on salary related to gender: Power of controlling other variables through Paired comparisons
Video on paired t-test
Male & Female Salaries are they significantly different? Results with interpretation: sexsalt.xls

Application of confidence intervals when data consists of population proportions Drug Use Survey

# Exam 1--Wednesday Oct 10-onsite

Will be the paper exam with mostly MC questions.

The multiple exam will have questions of the sort seen on the quizzes, though usually modified so the answers aren't the same.
Use these as a guide for the types of information to be covered but you'd better make sure you understand the information rather than just memorizing these questions!

## Ch 13 Analysis of Variance

Concepts of ANOVA , "explaining variance" Where does it come from and how can you mathematically separate out the sources of variation?
Ho: all means are the same
Ha: at least two differ from each other.
If Ho is rejected, Which ones are different from which?
Least Significant Difference (LSD) good discussion at http://helios.bto.ed.ac.uk/bto/statistics/tress6.html
basically, LSD= t(alpha,dferror)*sqrt((2*MSE)/n)
also see the Multiple Range test with a studentized range.

EXCEL model showing partition of Variance by averaging out different effects.  Be sure to set Tools  >Macro >security at  "medium" and enable the macros so the drop down list works.

Exercise 3 Store sale prices due Friday night October 19 before midnight

 There is a set of videos on exactly how to do all this in EXCEL 2007 on the videotutorials page, though you will need to add the text describing the results.

## Exam 1: October 10 5:30-6:45 SSB216

Multiple choice on Wednesday October 10
Individual scores are in the mygateway gradebook.
I'll go over all the questions in class online on Monday, October 15

Concepts of ANOVA  gender-height data from F01 correcting for other effects
Golf balls and clubs data cxa15_06.prn
Video with sound 1-Factor ANOVA  2-Factor ANOVA with Replication
Screen captures of golf club example  if you want to see how means could be adjusted to arithmetically to do what is done in ANOVA, canceling the treatment effects. golfadj1.jpg
and golfadjf.jpg , showing formulas 2wayanov.xls
completed comparisons showing "manual" calculations golf9.xls
More Discussion

## Ch 14 & 15Regression

-- Simple Correlation And Linear Regression (pdf regression overview)
See the Varsorc modelabout detecting effects in the presence of other factors
Regression least squares fit equations and EXCEL equivalent functions

Ch 14 & 15 Regression
Is crime related to level of gambling activity? Quadratic models--  Electricity Usage and size of house, kwhuse.xls ,  results kwhuse9.xls
Learning Curves, assembly.dat  results  9 am  how does this relate to learning curve theory
data for paste special/ transpose.

Interaction example from book--the clock auction.

Weeks9-12

## Examples of CHISquared studies:

Note you can save just the analysis in a small file separate from the data by copy/paste into a new sheet. Just be sure to paste in the same cells as there are absolute references that might otherwise be messed up.
Examples for highschool program:
just percents:
--metrostatus and dangerous behaviors
--Increase in Marijuana Use with Age
--Marijuana Use and Suicide
news item highschool girls using steroids
CDC Report trends in Cigarette Smoking Among Adults
YRBSS2001 data with smoking vs. Sex
Smoking vs Sex-2003
who is hitting whom?
Being hit related to sex-2003
Who gets better grades in Highschool?

Grade in school vs. age distributions-2003

Sexual experience and carrot consumption: carrotx2.xls

 EXC grades available until  last drop day
Processing larger datasets
(YRBSS data Weight Multiple Regression
Exercise 4-Due Thursday November 8-before midnight
we'll walk through the actual exercise together in class --each person with their own individualized data) to complete a larger dataset Multiple regression with about 14,000 records, studying the age, gender, height effects on weight of highschool students from The Youth Risky Behavior Surveillance System  (YRBS)
Click here for detail on the assignment. See the Multiple Regression Video for explicit details

See the screen captures on parsing the 2003 data into EXCEL.
Use the 2003 data for this exercise.

# ~~~~~~~~~~~~~~~

 Later, for the original research project due before the second exam at the end of the course, you will use the 2005 data Here are screen captures and description of parsing for the 2005 data.

## Chi-squared analysis for independence

Crosstabulation of variables can most simply be done by pivot tables in EXCEL. See the videos and an example sheet on the YRBS2001 data.We will use pivot tables in EXCEL to generate  the data we need from YRBS2003 for chi2 analysis
We will walk through this together in class.

## Exercise 5- Pot vs Sadness

See the screen captures on parsing the 2003 data into EXCEL.
Use the 2003 data for this exercise.

 I don't have videos in EXCEL 2007 for the YRBSS2003 data, but if you are using EXCEL 2007 and haven't figured out the differences from older versions shown in class and the videos, you might want to preview the new videos on pivottables in EXCEL 2007, below. Just remember these new videos involve the YRBSS2005 data, so it would be better to follow the parsing procedure in the old videos.

due Thursday November 15-before midnight-Marijuana use (amount in the last 30 days) And depression (debilitating sadness).
get it in early (at the end of the semester, due dates are firm to allow for timely grading )
a printable version of the exercise in WORD
You should be able to drop the whole file in the dropbox, so don't worry about instructions on how ro cut it down to a smaller file.
More detail on pivotTables in EXCEL, usepivot.doc

 Crosstabulation can also be done in ACCESS. See the videos on importing the YRBS99 into ACCESS that are on the BA103 site--the procedure is similar to the EXCEL Wizard Management of large files using FTP or  Winzip Crosstabulating variables in ACCESS to preprocess for statistical analysis. Chi Square can be done on ACCESS crosstabulated data from the YRBSS.  Crosstab Query video

Description and videos

## Discussion:

Say something about Pvalues for the aggregate and for each of the departments, and also to comment on what the mechanism is that gives these paradoxical results. The mechanism discussion needn't be highly mathematical or technical, but rather conceptual, and what it means in real world terms.
You can take a look at some of the references to better understand this, if necessary.
Rainy days
Ch 12. Categorical Analysis Chi Square tests of independence drinking and smoking. Spreadsheet formulas: smkdrkfo.jpg There are, of course, several ways to approach the calculation of the table of expected values. Another approach used in class F02  is shown in smkdrkf2.jpg

Copying text out of a PDF: pdftxt.html
and pasting into EXCEL pastechi
Formulas are designed to accomodate new problems by inserting or deleting rows or columns expchi Then all you have to do is paste the observed values into the first table, and the whole sheet will recalculate to give the solution.
Father's and son's career paths
This is a problem taken from the old text book, p.965, problem 16.38. Pasting new data from the strangely arranged file that came with the book into a modified sheet is shown in pastenew
Bring your disk to the final exam, as you will have a problem requiring deletion and insertion. If you know how to do this, the problem will take only a few minutes to do.

 Everything is due by the Last Class. Nothing will be accepted late.

## Time series and forecasting. PPt summary and tools in EXCEL

### Stock Market information:

• Market Beta for individual stocks
• NYSE Stock directory
• NASDAQ site
• AMEX
• Yahoo Links on Investment concepts
• BUDweiser and other symbol lookup at YAHOO
• to find history, click on historical prices.
• Stock Market Beta  = volatility relative to the market, a measure of nondiversifiable risk. (definitions from Yahoo) Should be related to Return.
• Estimating Risk Parameters and Costs of Financing-- a chapter from Aswath Damodaran of NYU
• Examples from Class: Budweiser (Beta may be zero) and IBM (Beta >1) compared to S&P 500
• ### Exercise 7 - will not be done.

• Download 5 years' worth of monthly prices from YAHOO for the S&P 500. there is a download to spreadsheet link at the bottom of the historical prices. open the file in EXCEL and calculate returns as (Pt-Pt-1)/Pt-1, based on adjusted closing prices.
• Do the same for a stock of your choice (avoid  stocks others are doing)
• Re-sort the date from oldest to most recent data so that the time series charts are in the right direction.
• chart a time series for the two returns and estimate trend lines using the right-click chart options.
• Produce a scatterchart of the single stock returns vs. the S&P and estimate an intercept and  linear slope with chart options.
• Conduct a Simple Regression procedure and see if you get a slope significantly different from zero. Comment on the slope and the Pvalue. This is known as Market Beta.
Last Class Wednesday, december 5, Application Reports due, course evaluation.

## Time series and Forecasting

powerpoint from the book

Seasonal data on KWH use Timseasn.xls

## Think for yourself.

How bad is unemployment right now? See historical and current unemployment statistics from the Current Population Survey Statistics . Here is data converted to EXCEL, unemp.xls

Did the economy take a dive after 9/11 or was the dip something that started even before the current administration came into office?
Historical Nasdaq daily data in EXCEL
Historical Stock index data free from  YAHOO:
Dow-Jones index monthly, 1928 to current djindex.xls
S&P 500 from 1982 to present spindex.xls

## Application reports:

The final project is worth 10% of the grade and will be used to determine grades if you are on a borderline.
The final project is due as hardcopy and a full EXCEL file before the exam on December 10. If you aren't taking the second exam, drop it at my mailbox in SSB 230. To avoid problems with large files corruption in mygateway, there's no dropbox. Instead, turn in the full EXCEL file with data and calculations intact either on CD ROM or on a USB flashdrive. USB flashdrives can be found at about \$5-10 at places like Office Depot, Walmart, amazon.com, buy.com (free shipping) and such, so many people use them pretty much as we used to use floppy disks.
If you aren't taking the second exam, drop it at my mailbox on SSB floor 2.

If you submit the project by Friday, December 7, you can try it by email and I'll put a grade on it.
If you are procrastinating and planning on turning it in after Friday, then I don't want to deal with the possibility of going around about corrupted files while I'm trying to get grades calculated and submitted. If it's not on CD or Flashdrive after Friday, then there will be a severe penalty, and if I can't grade it, there will be no grade for it. This is just a practical matter, as I have to put in all the grades at once.

Refer to the 2005 Data Users Manual at the CDC website
Here is a "cut down" version of the data user's Manual, if you just want the primary data.

Download and parse the ASCII data into EXCEL from the YRBSS 2005 dataset. Here are screen captures and description of parsing.

 There are new videos on downloading, parsing, pivottable generation and saving the result  for YRBSS 2005 data in EXCEL 2007 on the videos page. Note that pivottables generated in EXCEL 2007 don' t convert to earlier versions, so working across versions is not straightforward... More on this as I study it further.
Do a study of your choice with the appropriate statistical test (probably Chi2 for relationship between two or more variables) to answer a question that is of interest to you.
There are literally millions of studies you could do. At this point you are probably doing original research that has never been explored by anybody else.
Use graphs if appropriate to describe what you found.   If you really have a hot project in mind, consider expanding it a little to make it presentable at the Undergraduate Research Symposium in 2008.

The Multiple Choice Exam will be done without computer access, closed book, one page of notes allowed.
Use quizzes as a guide for the types of information to be covered but you'd better make sure you understand the information rather than just memorizing these questions!
It's up to you whether you take the second exam. If you don't show on Monday, then I'll just double the grade on the first one. Be careful about being too optimistic about raising your grade, as it could go down as well!

## Liveclassroom course schedule

Ref No /Section
date/ Time  Place
Onsite Intro MW  5:30-6:45 Aug 20-22  SSB216
Online Classes  MW  5:30-6:45 mygateway/communications/Liveclassroom
Exam1 MC  wed October 10, 5:30-6:45 SSB 216
Exam2  MC  Monday, dec 10   5:30-7:30 SSB 216
--Exam 2 covers a few more types of statistical tests than exam 1, but the concepts are the same. If you are happy with the grade you earned on exam 1, you can skip exam 2 and I will just multiply the grade on exam 1 times 2 in calculating the grade. Many people at this point have solid good grades, and it would be silly to go through the busy work of taking and grading another exam.

Switching sections is not allowed unless there is a good reason and prior permission. Taking an exam at a later time than prescribed may result in a grade penalty. Doing so without permission may result in a zero grade for the exam.

comments or questions to bud_banis (at) umsl.edu

 Bud's Statistics Book sites: heuristicbooks.com statisticsbook.com statisticsbooks.com std-statistics.com winningwithstatistics.com questionnairehints.com questionnairetips.com statisticsisforwinners.com related sites statisticsvideos.com instructionalvideotutorials.com