BA3300 Introduction to Business Statistics

  • "What is not measured cannot be controlled." (paraphrase of  W. Edwards Deming)
  • "Argument is only possible when you don't have the facts. Get the facts." (paraphrase of Peter F. Drucker)
  • "Discovery consists of seeing what everybody has seen and thinking what nobody has thought." —Albert von Szent-Gyorgy

  • Statistics!   A Cynical View

    Spreadsheet to estimate course grade:


    This sheet is for your convenience in deciding whether to take the second exam.  Exam 2 covers a few more types of statistical tests than exam 1, but the concepts are the same. If you are happy with the grade you earned on exam 1, you can skip exam 2 and I will just multiply the grade on exam 1 times 2 in calculating the grade. Many people at this point have solid good grades, and it would be silly to go through the busy work of taking and grading another exam.
    Remember, too that grades may go down as well as up from results of the second exam.

    New link to out-of-class FLASH videos  video tutorials

    Videos of Winter 2006 Centra online sessions can be played online--They were converted to FLASH videos with more convenient navigation than the original proprietary format.

     Here is a 580MB zip file of the online course videos which also include the out-of-class videos. Download the zip file, unzip it and copy it to CD to make an autorunning CD. Operations videos aren't included in this collection because of CD size limitations--get those from the BA3320 or LOM5320 sites.

    In the future, upgraded versions of these will be available on CD through my classes or from

    Paper Syllabus from (online Course)
    Horizon Wimba Live classroom support documents at
    Technical Support for liveclassroom at the UMSL help desk, (314) 516-6034
    The participants manual for version 4.3 is available.  A course CD with all my videotutorials was distributed in the first introductory sessions. More videos with new versions of OFFICE will be made available at this website as the semester progresses.


    BA3300 Introduction to Business Statistics


    "This is a second course in a required two-course statistics sequence. The first course (Math 105/1105 or its equivalent), covers the basics of probability and statistics, and BA3300 builds on that material. It is assumed that the student is acquainted with the material in Math 1105, and basic use of a PC-Compatible computer, including navigating on the web, fundamental concepts of spreadsheets, and e-mail. Students will learn techniques used for relational analysis and business forecasting and how to apply them in a business context. Tools include Chi-Square tests of statistical independence; analysis of variance; simple linear regression and correlation; multiple linear regression; and extrapolative techniques such as moving averages and exponential smoothing. Emphasis is placed on problem definition, construction of statistical models, analysis of data, and interpretation of  results."


      BA1800(103), Math 1105(105), familiarity with basic statistics and PC-compatible computers, spreadsheets, email, and use of the internet. Minimum 2.0 Campus GPA. These prerequisites are not waived. 


    • Recommended Text: Anderson, Sweeney, Williams-Statistics for Business and Economics (8th, 9th or 10th edition as used in Math 1105); handouts and web materials. Here is a link to download the data  and tools related to the textbook.
    • Save your work every day on your virtual drive, disks or flashdrive. The plan is to turn in assignments and exams using the mygateway dropbox.  Most of our calculations will be done in EXCEL, but a calculator may still be useful. You shouldn't need a complex statistical calculator. 
    • You will need an email account.
    • A Notebook. Although a lot of the skills we will develop have to do with spreadsheets, My hope is to reduce emphasis on mechanics of the calculations so we can focus effectively on what the results mean rather than memorizing a bunch of formulas. A notebook will help as we work through the concepts.
    • Note: The Forgetting Curve
    • Ode to Procrastination
    • You should also have a copy of Microsoft Office on your computer so you can do the EXCEL assignments at home. Try the bookstore for the Academic version Office Pro at a very reasonable price. 



    HEY!! In Office 2007, where did the EXCEL menus go?

    Sometimes, new versions of software seem like this video microsoftnew versionsintroductions

    How to find EXCEL 2003 commands in EXCEL 2007 "Ribbons"
    try downloading the Excel Ribbon mapping workbook.
    I recommend clicking on the OFFICE icon at the top left of EXCEL 2007, click EXCEL OPTIONS (a button down at the bottom) and fix whatever needs fixing to get access to what you need, such as "Developer tab." Change the SaveAs default to old (1997-2003) EXCEL so you can still work on the product files on older versions of EXCEL. In case you have to, you might be able to find file converters at They are also promising to someday make their file formats open source. Uncommon fonts may also wreak havoc as you try to work on a file across machines. Change default font theme under "page layout." I know Times New Roman and Arial are uninspiring, but they are more likely to be on your other machine. After I get a feel for what needs to be fixed in the School's setup of EXCEL, I'll produce some guidance and videos on how to fix things, so you can relate my existing videotutorials to EXCEL 2007.
    Microsoft has a compatibility pack for installation in older versions of OFFICE so you can read and write the new file formats. You need to install all the updates first.
    if you can't use the link, then you can probably find it at the microsoft office site or google it.

    Fall, 2007 -- Online Session 

    Robert J. "Bud" Banis, Ph.D.,C.M.A.
    CCB 230 314-516-6136; 636-394-4950 
    E-mail: ba3300 (at) bud banis .com   (omit spaces) use this email address for all communications on the course. don't copy and paste it, as it has extra spaces to thwart automated harvesting by the spammers 

    Administrative issues:

    Historical Grade Distributions
    "The Rules" on things like attendance, drop dates, academic dishonesty, and the like. 
    Rules on cheating are enforced.
    See the discussion at
    Tentative GRADING: Online Section
    Two Exams 40%

    up to 12 quizzes 20%
     6 exercises 30%
    Research report (borderline decisions) Due last class 12/5
    hard copy at 2nd exam
    Total   points 100%
    Spreadsheet for grade estimation
    To help make a decision about whether to take the second exam.

    MC exams are closed book, one page (2-sides) notes, multiple choice and short answer. 
    Quizzes are open online in mygateway, but all exercises and exams are individual effort only.  Submissions very similar in form and /or content will result in academic affairs investigations. 
    Letter grade breaks are  90, 80, 70, 60%. Plus & minus grades are rare. Research Report quality will be used to decide on borderline cases.


    Review spreadsheet formula design from BA 1800 

    Resources and Statistics video tutorials

    Videos and other course materials on course CDs and DVD-- This is different from the CD's that came with the book.
    Excellent EXCEL Videotutorials from DataPigTechnologies

    Is EXCEL an adequate statistics package?

    computer labs
    Hours in the computer labs

    -----Some interesting statistics questions to research 

    Interesting links on controversial statistics issues in the news:

    Global warming 

    This is a fascinating and hotly argued issue that provides some stimulating examples of poor statistical reasoning.
    The base issue appears to be relationship between atmospheric CO2 concentrations and climate change. Correlation is not causality, and a look at the data allows strong questioning of the supposed "causal relationship" of CO2 to climate change.
    CO2 science
    The petition project and Environmental Effects of Increased Atmospheric Carbon Dioxide

    Response to skeptics by advocates of AGHG models--this site purports to refute the skeptics, but much of the refutation appears to be name-calling and the discussions reinforce skepticism if you look at the data--see, for example, the discussion about correlation and causality for CO2 and temperature change:  "There is no proof that CO2 is causing global warming"
    The fact of the matter is that you don't "prove" anything--either way--with statistics, especially if it's based on computer models and selected data. is a website by people who are making their living in the global warming industry. This is pretty much the official gospel by those who are paid to preach it.
    Annotated UN IPCC paper on the "scientific basis" of the AGHG model mostly pointing out that the press reports on confidence levels were highly exaggerated and that the calamitous predictions are based almost solely not on data, but on computer models that have failed to predict even the past. See, for example, the data on storm frequency and severity in the 2007 paper on Environmental Effects of Increased Atmospheric Carbon Dioxide
    The UN has a strong incentive to promote the AGHG fears as it's been proposed that the solution is to grant the UN power to tax industries and to regulate "carbon offset" schemes. This tax would help to redistribute wealth by "supporting UN development projects in poor countries."
    [UN Committee] "Members agreed that reversing the widening and 'shameful' gap between rich and poor countries 'is the pre-eminent moral and humanitarian challenge of our age.'"

    If you want to read more about this, just google united nations carbon tax. Fascinating.

    CPI Calculations

    Is the CPI an accurate estimate of inflation?

    Interpretations of Statistical terminology

    Tutors for the Fall Session.

  • Guodong Zhang and Vicki Chen are the BA 3300 tutors this  semester, with office hours in CCB213 during these times:

  • Guodong:
    email address is gzq35 @

    Monday 12-2pm and 4-5pm

    Thursday 10:50-12:50pm and 4-5pm

    email address is
    Vickichen2008 @

    Tuesday 10:30-12:30pm and 3:30-4:30

    Wednesday 11:30-1:30pm and 3:30-4:30



    Approximate timing

    in process: 
    Weeks 1-4 



    Overview, review of Z and t distributions, hypothesis testing. The Central Limit Theorem. One sample t-test. 

    Review of Netscape Navigator/MS Internet Explorer, Windows Explorer and EXCEL Editing, converting and cleaning up real data-Descriptive Statistics, Graphing, t-tests. Comparing two sample means. 
     Normsdist function:
    Be the "Martha Stewart" of Statistics-- make your own Z score Table. The "Martha Stewart" table of critical t's (and a more complex version of t-tail probabilities
    Exercise 1: Sampling and the central limit theorem. 
    Due Before Saturday Night, September 22
    using the random sampling procedure, collect 20 samples of size n=25 from the airplane empty seat data. 
    Derive means, standard deviations and 95% confidence intervals for estimates of the population mean for each of the 20 samples. 
    Compare histograms of the original distribution and the distribution of sample means. 
    If you are doing this in EXCEL 2007, you will find the data analysis procedures under the data tab -- data analysis. If you want to write a macro, you'll have to set up to show the developer tab by clicking on the "office icon on the upper left, then "excel options" button and check the box to show the developer tab. The button and other forms are found under the developer tab / insert / form controls.
    This used to be a simple toolbar! I'm developing a general impression that Microsoft has made life a lot more complicated with the new version.
    The out-of-class videotutorials page now has new videos with step-by-step instructions in EXCEL 2007. Look for the pink box.
    Turn in the resulting spreadsheet to the mygateway dropbox set up under assignments, set up to be printed to be 1 page (legible) with your name and student number printed at the top of the sheet. You can accomplish this by printing a selected range that leaves out some of the original data.
    video on print to 1 page
    Submit a file via the Mygateway dropbox. be sure to click send to submit.
    How many of your samples had intervals including the true mean? 
    How many gave an erroneous estimate? 
    What is the expected frequency of errors in estimating the population mean when you use a 95% confidence interval? Was your observed result close to that expectation? 
    State the three characteristics we expect to see in the distribution of sample means from the Central Limit Theorem. Are your results reasonably consistent with those expectations?

    Hypothesis testing and sample means h0test.pdf

    Caution in the interpretation of hypothesis test!

    Comparison of sample variances through F tests, comparison of sample means, controlled experiments, Ch 9 
    Exercise 2:
    Click here for Detailed Description with questions to guide you in interpretation
    Paired ttest on gender and grades Due Before Friday night at midnite October 5
    Required (or no grade):
    1)Print your name and student number (STUDENT NUMBER--up to 50 points will be taken off for not using the student number) at the top of the sheet. use the last 4 digits of your student number in the data in place of A,B,C,D.
    2)Print interpretations in text boxes by the procedures they refer to.
    See the completed example we did in class on the salaries data.
    3)Set up the spreadsheet to be printed to be 1 page-legible. You can accomplish this by printing a selected range that leaves out some of the original data or set it up to print 1 page wide by 2 long with all the analysis on th4e first page.
    4)Submit the file via the Mygateway dropbox.Title it Exc2 with your name.  Be sure to click "add" and then submit.
    There is a set of videos on exactly how to do all this in EXCEL 2007 on the videotutorials page, though you will need to add the text describing the results.

    Variance and Standard Deviation:

    Ch 6 Describing data measures of central tendency dispers.xls  : dispersity Areas under the normal curve-- the Z score table . Applications to normallized scores--the IQ Score examples

    Central Limit Theorem

    Sampling Statistics and confidence limits on estimates of parameters. Ch 7 Sampling from Airplane empty seats data airseats.htm
    video samplek.avi  Results show Central Limit Theorem on distribution of sample means. 
    Joint probabilities and probabilities of multiple errors in the sampling experiment. Normal approximation to binomial distribution.
    Evidence for carcinogenicity of Dioxin.

    Hypothesis testing and sample means

    Example one-sample t-test: Is hospital length of stay < 5 days?
    video on 1-sample t-test Beta error & sample size losbeta.xls

    T-test two sample means:

    gender and height data from class
    video on 2-sample t-test
    height data 2 sample t test completed

    paired Sample T test:

    Example on salary related to gender: Power of controlling other variables through Paired comparisons
    Video on paired t-test
    Male & Female Salaries are they significantly different? Results with interpretation: sexsalt.xls

    Application of confidence intervals when data consists of population proportions Drug Use Survey


     Exam 1--Wednesday Oct 10-onsite

    Will be the paper exam with mostly MC questions.

    The multiple exam will have questions of the sort seen on the quizzes, though usually modified so the answers aren't the same.
    Use these as a guide for the types of information to be covered but you'd better make sure you understand the information rather than just memorizing these questions! 

    Ch 13 Analysis of Variance

     Concepts of ANOVA , "explaining variance" Where does it come from and how can you mathematically separate out the sources of variation?
    Ho: all means are the same
    Ha: at least two differ from each other.
    If Ho is rejected, Which ones are different from which? 
    Least Significant Difference (LSD) good discussion at
    basically, LSD= t(alpha,dferror)*sqrt((2*MSE)/n)
    also see the Multiple Range test with a studentized range.

    EXCEL model showing partition of Variance by averaging out different effects.  Be sure to set Tools  >Macro >security at  "medium" and enable the macros so the drop down list works.
    Exercise 3 Store sale prices due Friday night October 19 before midnight
    There is a set of videos on exactly how to do all this in EXCEL 2007 on the videotutorials page, though you will need to add the text describing the results.

    Exam 1: October 10 5:30-6:45 SSB216

    Multiple choice on Wednesday October 10
    See distribution and link to answer key above.
    Individual scores are in the mygateway gradebook.
    I'll go over all the questions in class online on Monday, October 15

    Concepts of ANOVA  gender-height data from F01 correcting for other effects
     Golf balls and clubs data cxa15_06.prn
    Video with sound 1-Factor ANOVA  2-Factor ANOVA with Replication 
    Screen captures of golf club example  if you want to see how means could be adjusted to arithmetically to do what is done in ANOVA, canceling the treatment effects. golfadj1.jpg
    and golfadjf.jpg , showing formulas 2wayanov.xls
    completed comparisons showing "manual" calculations golf9.xls
    More Discussion

    Ch 14 & 15 Regression

    -- Simple Correlation And Linear Regression (pdf regression overview)
    See the Varsorc modelabout detecting effects in the presence of other factors 
    Regression least squares fit equations and EXCEL equivalent functions

    Ch 14 & 15 Regression
    Is crime related to level of gambling activity? Quadratic models--  Electricity Usage and size of house, kwhuse.xls ,  results kwhuse9.xls
    Learning Curves, assembly.dat  results  9 am  how does this relate to learning curve theory
    data for paste special/ transpose. 

    Interaction example from book--the clock auction.


    Examples of CHISquared studies:

    Note you can save just the analysis in a small file separate from the data by copy/paste into a new sheet. Just be sure to paste in the same cells as there are absolute references that might otherwise be messed up.
    Examples for highschool program:
    just percents:
        --metrostatus and dangerous behaviors
        --Increase in Marijuana Use with Age
        --Marijuana Use and Grades
        --Marijuana Use and Suicide
    news item highschool girls using steroids
    CDC Report trends in Cigarette Smoking Among Adults
    YRBSS2001 data with smoking vs. Sex
    Smoking vs Sex-2003
    who is hitting whom?
    Being hit related to sex-2003
    Who gets better grades in Highschool?
    Grades by sex-2003 

    Grade in school vs. age distributions-2003

     Sexual experience and carrot consumption: carrotx2.xls

    EXC grades available until  last drop day
    Processing larger datasets 
    (YRBSS data Weight Multiple Regression
    Exercise 4-Due Thursday November 8-before midnight
    we'll walk through the actual exercise together in class --each person with their own individualized data) to complete a larger dataset Multiple regression with about 14,000 records, studying the age, gender, height effects on weight of highschool students from The Youth Risky Behavior Surveillance System  (YRBS)
    Direct link to data page
     Click here for detail on the assignment. See the Multiple Regression Video for explicit details

    See the screen captures on parsing the 2003 data into EXCEL.
    Use the 2003 data for this exercise.


    Later, for the original research project due before the second exam at the end of the course, you will use the 2005 data
    Here are screen captures and description of parsing for the 2005 data.

    Chi-squared analysis for independence

    Crosstabulation of variables can most simply be done by pivot tables in EXCEL. See the videos and an example sheet on the YRBS2001 data.We will use pivot tables in EXCEL to generate  the data we need from YRBS2003 for chi2 analysis
    We will walk through this together in class.

     Exercise 5- Pot vs Sadness

    See the screen captures on parsing the 2003 data into EXCEL.
    Use the 2003 data for this exercise.
    I don't have videos in EXCEL 2007 for the YRBSS2003 data, but if you are using EXCEL 2007 and haven't figured out the differences from older versions shown in class and the videos, you might want to preview the new videos on pivottables in EXCEL 2007, below. Just remember these new videos involve the YRBSS2005 data, so it would be better to follow the parsing procedure in the old videos.

    due Thursday November 15-before midnight-Marijuana use (amount in the last 30 days) And depression (debilitating sadness). 
    get it in early (at the end of the semester, due dates are firm to allow for timely grading )
     a printable version of the exercise in WORD
    You should be able to drop the whole file in the dropbox, so don't worry about instructions on how ro cut it down to a smaller file.
     More detail on pivotTables in EXCEL, usepivot.doc


    Crosstabulation can also be done in ACCESS. See the videos on importing the YRBS99 into ACCESS that are on the BA103 site--the procedure is similar to the EXCEL Wizard Management of large files using FTP or  Winzip
    Crosstabulating variables in ACCESS to preprocess for statistical analysis. Chi Square can be done on ACCESS crosstabulated data from the YRBSS.  Crosstab Query video

    Exercise 6: Simpson's Paradox and the UC Berkely Graduate Admissions Case--Due Thursday November 29--before midnight

    Description and videos


    Say something about Pvalues for the aggregate and for each of the departments, and also to comment on what the mechanism is that gives these paradoxical results. The mechanism discussion needn't be highly mathematical or technical, but rather conceptual, and what it means in real world terms.
    You can take a look at some of the references to better understand this, if necessary.
    Rainy days
    Ch 12. Categorical Analysis Chi Square tests of independence drinking and smoking. Spreadsheet formulas: smkdrkfo.jpg There are, of course, several ways to approach the calculation of the table of expected values. Another approach used in class F02  is shown in smkdrkf2.jpg

    Copying text out of a PDF: pdftxt.html
    and pasting into EXCEL pastechi
    Formulas are designed to accomodate new problems by inserting or deleting rows or columns expchi Then all you have to do is paste the observed values into the first table, and the whole sheet will recalculate to give the solution. 
    Father's and son's career paths
    This is a problem taken from the old text book, p.965, problem 16.38. Pasting new data from the strangely arranged file that came with the book into a modified sheet is shown in pastenew
    Bring your disk to the final exam, as you will have a problem requiring deletion and insertion. If you know how to do this, the problem will take only a few minutes to do. 

    Everything is due by the Last Class. Nothing will be accepted late.

     Time series and forecasting. PPt summary and tools in EXCEL

    Stock Market information:

    • Market Beta for individual stocks
    • NYSE Stock directory
    • NASDAQ site
    • AMEX
    • Yahoo Links on Investment concepts
    • BUDweiser and other symbol lookup at YAHOO
    • to find history, click on historical prices.
    • Stock Market Beta  = volatility relative to the market, a measure of nondiversifiable risk. (definitions from Yahoo) Should be related to Return.
    • Estimating Risk Parameters and Costs of Financing-- a chapter from Aswath Damodaran of NYU
    • Examples from Class: Budweiser (Beta may be zero) and IBM (Beta >1) compared to S&P 500
    • Exercise 7 - will not be done.

    • Download 5 years' worth of monthly prices from YAHOO for the S&P 500. there is a download to spreadsheet link at the bottom of the historical prices. open the file in EXCEL and calculate returns as (Pt-Pt-1)/Pt-1, based on adjusted closing prices.
    • Do the same for a stock of your choice (avoid  stocks others are doing) 
    • Re-sort the date from oldest to most recent data so that the time series charts are in the right direction.
    • chart a time series for the two returns and estimate trend lines using the right-click chart options.
    • Produce a scatterchart of the single stock returns vs. the S&P and estimate an intercept and  linear slope with chart options.
    • Conduct a Simple Regression procedure and see if you get a slope significantly different from zero. Comment on the slope and the Pvalue. This is known as Market Beta.
    Last Class Wednesday, december 5, Application Reports due, course evaluation.

    Time series and Forecasting

    powerpoint from the book

    Seasonal data on KWH use Timseasn.xls
    Some Really BAD Forecasts

    Think for yourself.

    How bad is unemployment right now? See historical and current unemployment statistics from the Current Population Survey Statistics . Here is data converted to EXCEL, unemp.xls

    Did the economy take a dive after 9/11 or was the dip something that started even before the current administration came into office?
    Historical Nasdaq daily data in EXCEL
    Historical Stock index data free from  YAHOO:
    Dow-Jones index monthly, 1928 to current djindex.xls
    S&P 500 from 1982 to present spindex.xls

    Application reports:

    The final project is worth 10% of the grade and will be used to determine grades if you are on a borderline.
    The final project is due as hardcopy and a full EXCEL file before the exam on December 10. If you aren't taking the second exam, drop it at my mailbox in SSB 230. To avoid problems with large files corruption in mygateway, there's no dropbox. Instead, turn in the full EXCEL file with data and calculations intact either on CD ROM or on a USB flashdrive. USB flashdrives can be found at about $5-10 at places like Office Depot, Walmart,, (free shipping) and such, so many people use them pretty much as we used to use floppy disks.
    If you aren't taking the second exam, drop it at my mailbox on SSB floor 2.

    If you submit the project by Friday, December 7, you can try it by email and I'll put a grade on it.
    If you are procrastinating and planning on turning it in after Friday, then I don't want to deal with the possibility of going around about corrupted files while I'm trying to get grades calculated and submitted. If it's not on CD or Flashdrive after Friday, then there will be a severe penalty, and if I can't grade it, there will be no grade for it. This is just a practical matter, as I have to put in all the grades at once.

    Refer to the 2005 Data Users Manual at the CDC website
    Here is a "cut down" version of the data user's Manual, if you just want the primary data. 

    Download and parse the ASCII data into EXCEL from the YRBSS 2005 dataset. Here are screen captures and description of parsing.
    There are new videos on downloading, parsing, pivottable generation and saving the result  for YRBSS 2005 data in EXCEL 2007 on the videos page. Note that pivottables generated in EXCEL 2007 don' t convert to earlier versions, so working across versions is not straightforward... More on this as I study it further.
    Do a study of your choice with the appropriate statistical test (probably Chi2 for relationship between two or more variables) to answer a question that is of interest to you.
    There are literally millions of studies you could do. At this point you are probably doing original research that has never been explored by anybody else.
    Use graphs if appropriate to describe what you found.   If you really have a hot project in mind, consider expanding it a little to make it presentable at the Undergraduate Research Symposium in 2008.


    The Multiple Choice Exam will be done without computer access, closed book, one page of notes allowed.
    Use quizzes as a guide for the types of information to be covered but you'd better make sure you understand the information rather than just memorizing these questions! 
    It's up to you whether you take the second exam. If you don't show on Monday, then I'll just double the grade on the first one. Be careful about being too optimistic about raising your grade, as it could go down as well!

    Liveclassroom course schedule

    Ref No /Section 
    date/ Time  Place
     Onsite Intro MW  5:30-6:45 Aug 20-22  SSB216  
    Online Classes  MW  5:30-6:45 mygateway/communications/Liveclassroom
    Exam1 MC  wed October 10, 5:30-6:45 SSB 216
    Exam2  MC  Monday, dec 10   5:30-7:30 SSB 216
    --Exam 2 covers a few more types of statistical tests than exam 1, but the concepts are the same. If you are happy with the grade you earned on exam 1, you can skip exam 2 and I will just multiply the grade on exam 1 times 2 in calculating the grade. Many people at this point have solid good grades, and it would be silly to go through the busy work of taking and grading another exam.

    Switching sections is not allowed unless there is a good reason and prior permission. Taking an exam at a later time than prescribed may result in a grade penalty. Doing so without permission may result in a zero grade for the exam.

    last modified August 24,2009
    comments or questions to bud_banis (at)


     Bud's Statistics Book sites:

    related sites