How to Help Biologists Understand Biostatistics

Let us be honest with ourselves. Most biostatistics courses teach biologists how to perform statistical analysis, but virtually nothing about what can be inferred from their results, what those inferences are assuming, nor what is possible if those assumptions are incorrect.At least, if biologists are aware of those issues, judging from what is published in scientific journals, they are generally ignored.This situation has been described repeatedly over several decades, but remains largely unchanged, even among premier journals.Since improvements in some areas of medical research were attributed to strongly stereotyped designs and analyses which now prevail, a few statisticians consider the way forward is to further constrain biologists freedom of choice in this respect.The cure for inept erroneous thinking is not to outlaw judgement and innovation, but to improve the quality of understanding.That said, the attempt to only use stereotyped designs and analyses seems doomed to failure, partly because of the sheer variety of biological research (most noticeably in ecology) and partly because few statisticians (other than epidemiologists) are happy to propose pre-cooked mechanical procedures for observational (rather than experimental) studies.Statistical editors have attempted to impose some order upon matters, but the results are at-best uneven.Medical journals for instance, strongly prefer confidence intervals over tests, and (apparently) prefer treatments and outcome variables to be collapsed to binary scales.Ecologists employ virtually every sort of design and analysis, and describe them very differently from clinicians, but exhibit no more awareness of their reasoning or assumptions.Veterinary research remains largely unchanged, and continues to promote premature alopecia among statisticians.Given the clamour to improve statistical refereeing has proven rather ineffectual, perhaps the answer is to improve the quality of what is being refereed?Whilst a statistically valid survey of biostatistics courses for biologists is beyond our means, as far as can be judged the following analysis would not be grossly in error. At the undergraduate level, most biologists are still provided a 'watered-down' first-year undergraduate statistics course - on the basis that their needs are the less.Unfortunately, biologists do not require less statistical training, but a different training.The assumption that biologists can consult statisticians for matters beyond their understanding seldom works when neither party understands the other's underlying reasoning and they do not share a common terminology.A solution to these problems is not to dumb-down statistics courses, but to approach this subject in another way.For instance why not broaden the scope of biostatistics lectures, from providing tightly focused sets of 'recipes', to critically examine and compare real problems and to compare differing approaches to those problems?A rather different problem arises because different biological disciplines employ very different conventions, terminology, and popular designs/analyses.One result thereof is biological research is progressively compartmentalized.Perhaps what are needed are less subject-specific biostatistics courses, which put more emphasis upon the underlying reasoning of statistical analysis? Learning to do statistics is much easier when you understand what you are trying to achieve.An alternate approach is to use graphical and analytical software as tools, and show students how they can be used to explore the properties of statistics.Biologists are seldom taught how to explore their data, nor how their data deviate from their analytical models, nor to what extent their study design and analysis match - nor even whether their study design has anything in common with how their data were actually gathered.These omissions are rather surprising as most theoreticians seem all too aware of how poorly real data matches the popular theoretical models.Certainly, in the headlong rush to produce and assess summary statistics, so little attention is paid to these fundamental issues that biologists should be forgiven for assuming they are of little importance.Journal editors, in their desire to save expensive print space, and to cut down on what may appear unneeded discussion, are loath to publish exploratory or diagnostic analyses.Taking this to its logical conclusion, a few journals are notorious for pruning experimental results down to naked P-values and numbered variables.One way to improve matters would be to accept how poor traditional methods, such as histograms, are for exploring how well real data (or their residuals) 'fit' the popular theoretical models, and to instead use rank based methods.Another option is to use simulation models to show how violating the assumptions of standard procedures affects their outcome.Conventional wisdom dictates that biostatistics are best taught by theoretical statisticians with a strong publication record.Whilst this reasoning may hold for teaching statisticians, the consequences when teaching biologists are depressingly predictable and all too familiar.Theoretical statisticians tend to reason in mathematical terms, and explain statistical logic in the same way.They tend to assume that other approaches (such as resampling) are neither reasonable nor possible.Since biologists tend not to have strong mathematical backgrounds, such explanations tend not to work very well - which is one reason the reasoning and assumptions of statistics are covered minimally and superficially.Catering to the felt-needs of all concerned, biostatistics lectures concentrate upon enabling biologists to carry-out a set of popular conventional analyses.Biologists, assuming that being able to do the analysis means they understand the results, only become aware that something is wrong when they attempt to publish their research - hence this is often the time when they finally consult a statistician.Lastly, and perhaps most worryingly, few biologists are aware of how prevalent statistical misdesign and misanalysis is among published work, nor the implications thereof.Why do statistical explanations have to be mathematical? And, for that matter, are theoretical statisticians really the most suitable people to teach biologists how to understand statistics? Rather than teaching simplistic practical skills in isolation, why not explore other ways to convey the reasoning underlying what is being done?Biostatistics teaching is dictated by factors other than trying to convey understanding.One obvious problem in this respect is that, aside from the pre-cooked procedures taught by some clinical/epidemiology courses, data analysis is routinely dealt with separately from study design. again it is tacitly assumed that biologists will consult a statistician when designing a study, rather than after the data are gathered, and conclusions formed.The most common solution to this occurs when a biologist has a paper rejected although all the conventions were followed, and that biologist realises that blindly applying recipes is not the wisest approach.An alternate approach is to apply different analyses to the same data and to compare their outcomes.Partly because of the parametric normal-approximation paradigm, little attention is given to ensuring assumptions are met, or in discussing how the resulting inference can mislead, or the importance thereof.Almost no attention is paid to which design/analysis options are available where assumptions are not fully met.One consequence is the frequent assumption that nonparametric methods entail no assumptions whatsoever.Another common misconception is that applying a sophisticated analysis to convenience or arbitrarily selected data makes the results more worthwhile.There are few things as instructive as showing how apparently respectable analyses conceal worthless results, or showing how common these are in published papers!Modern methods of approaching statistics such as Monte-Carlo, which provide powerful insights into the reasoning and practice of statistics, are regarded as advanced topics and hence relegated to advanced courses.Conversely, old-fashioned, inappropriate, confusing methods, being conventional, 'historical' and uncontroversial, are used for introductory, elementary, basic or foundation courses.It is simple-to-do procedures which are considered most appropriate for students, not simple-to-understand, and most especially not anything which is unconventional, or that is simple to understand by anyone uncomfortable with mathematics.Put another way, merely because a student can recite the formula for a normal probability density, or follow through its algebraic derivation, or feed numbers into a formula and do the arithmetic correctly, does not imply that student has the remotest grasp of its implications.One solution to this is to integrate study design and analysis without ending up with a list of prescriptive 'recipes'.Another approach is to use methods such as simulation modelling to explore how statistics behave, without assuming parametric models.Monte-Carlo models readily expose underlying statistical reasoning/assumptions and, using modern PCs, can be highly interactive.Partly owing to the ongoing profound schisms within statistical theory, virtually no attention is paid to critically evaluating results, or to the reasoning underlying study design and analysis.As a result biologists, routinely apply designs and analyses because they are conventional rather than appropriate, and run into difficulties when they encounter statistical referees which believe different conventions are appropriate.A further consequence is biologists are unable to critically evaluate each other's work, even though this is supposed to be central to the practice of science.A particularly distressing manifestation of this is the prevalence of 'comparative' studies (for example comparing species richness in organic versus conventional farms).If convenience sampling is used to select the farms, no statistical analysis is valid other than for those particular farms.Yet such studies continue they have a considerable (if misleading) impact.Applied statisticians, for their part, are all too aware that the assumptions of most statistical models are routinely violated, but to avoid alienating themselves by binning study results wholesale, they apply the analyses anyway.The crucial step in solving this problem is not convincing the scientific establishment that things must change, but in teaching postgraduate biologists something they think they already know.Having said all of this, we must accept our suggestions do not easily fit into the standard approach to teaching biologists how to do a predefined set of statistical procedures.Here lies our biggest problem - conventional methods are designed to convey and test theoretical or practical knowledge, not understanding, nor insight.Therefore, whether you are a student or professional biologist, and feel any of the above comments make sense to you, please consider this rather different statistics course for biologists. http.//www.Influentialpoints.Com/Training/recommended_biostatistics_course_free_download.Htm which is designed to address those issues.

How to Help Biologists Understand Biostatistics