Aroxoo Malik. Show More. Views Total views. Actions Shares. No notes for slide. Role of Statistics in Scientific Research 1. Questions Contd. What is Statistics? Dodge, Y. Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and are then used to draw inferences about the process or population being studied; this is called inferential statistics.
Why Study Statistics? What is the importance of statistics in scientific research? For example statistics can used as in data collection, analysis, interpretation, explanation and presentation. Use of statistics will guide researchers in research for proper characterization, summarization, presentation and interpretation of the result of research.
What is the role of statistics in scientific research? Describe the importance of statistics in different fields of study. Statistics has important role in determining the existing position of per capita income, unemployment, population growth rate, housing, schooling medical facilities etc…in a country.
Now statistics holds a central position in almost every field like Industry, Commerce, Trade, Physics, Chemistry, Economics, Mathematics, Biology, Botany, Psychology, Astronomy, Information Technology etc…, so application of statistics is very wide. Specialties have evolved to apply statistical theory and methods to various disciplines. So there are different fields of application of statistics.
Some of those are described below. We certainly expect other important research areas to emerge and flourish. We also excuse ourselves for not citing references related to the research discussions; the body of work from which we have drawn inspiration is simply too large for this article.
From personalized health to personalized learning, a common research goal is to identify and develop prevention, intervention, and treatment strategies tailored toward individuals or subgroups of a population.
Identification and validation of such subgroups using high-throughput genetic and genomic data, demographic variables, lifestyles, and other idiosyncratic factors is a challenging task.
It calls for statistical and machine learning methods that explore data heterogeneity, borrow information from individuals with similar characteristics, and integrate domain sciences. Subgroup analysis calls for the development of integrated approaches to subgroup identification, confirmation, and quantification of differential treatment effects by using different types of data that may come from the same or different sources.
Dynamic treatment regimes are increasingly appreciated as adaptive and personalized intervention strategies, but quantification of uncertainty requires more studies, as well as building treatment regimes in the presence of high-dimensional data. Machine learning has established its value in the data-centric world. From business analytics to genomics, machine learning algorithms are increasingly prevalent. Machine learning methods take a variety of forms; some are based on traditional statistical tools as simple as principle component analysis, while others can be ad hoc and are sometimes referred to as black boxes, which raises issues such as implicit bias and interpretability.
Algorithmic fairness is now widely recognized as an important concern, as many decisions rely on automatic learning from existing data. One may argue that interpretability is of secondary importance if prediction is the primary interest. However, in many high-stake cases e. By promoting fair and interpretable machine learning methods and taking ethics and replicability as an important metric for evaluation, statisticians have much to contribute to data science.
Statistical inference is best justified when carefully collected data and an appropriately chosen model are used to infer and learn about an intrinsic quantity of interest. Such a quantity e. In the big data era, however, statistical inference is often made in practice when the model and sometimes even the quantity of interest is chosen after the data are explored, leading to postselection inference.
Interpretability of such quantities and validity of postselection inference have to be carefully examined. We must ensure that postselection inference avoids the bias from data snooping, and maintains statistical validity without unnecessary efficiency losses, and moreover that the conclusions from such inference have a high level of replicability. When we have limited data, the emphasis on statistical efficiency to make the best use of the available data has naturally become an important focus of statistics research.
We do not think statistical efficiency will become irrelevant in the big data era; often inference is made locally and the relevant data that are available to infer around a specific subpopulation remain limited. On the other hand, useful statistical modeling and data analysis must take into account constraints on data storage, communication across sites, and the quality of numerical approximations in the computation.
The need to work with streaming data for real-time actions also calls for a balanced approach. This is where statisticians and computer scientists, as well as experts from related domains e. It is of high importance to develop practical scalable statistical inference for the analysis of real-world massive data. This requires multifaceted strategies. Examples include sparse matrix construction and manipulation, distributed computing and distributed statistical inference and learning, and cloud-based analytic methods.
A range of statistical methods have been developed for analysis of high-dimensional data with attractive theoretical properties. However, many of these methods are not readily scalable in real-world settings for analyzing massive data and making statistical inference at scale. Examples include atmospheric data, astronomical data, large-scale biobanks with whole genome sequencing data, electronic health records, and radiomics.
Statistical and computational methods, software, and at-scale modules that are suitable for cloud-based open-source distributed computing frameworks, such as Hadoop and Spark, need to be developed and deployed for analyzing massive data. In addition, there is a rapidly increasing trend of moving toward cloud-based data sharing and analysis using the federated data ecosystem Global Alliance for Genomics and Health, , where data may be distributed across many databases and computer systems around the world.
Distributed statistical inference will help researchers to virtually connect, integrate, and analyze data through software interfaces and efficient communications that allow seamless and authorized data access from different places. Reproducibility and replicability in science is pivotal for improving rigor and transparency in scientific research, especially when dealing with big data National Academies of Sciences, Engineering, and Medicine, Credits: attanatta.
For example, a biostatistician may be involved in researching the rate of HIV spread and invasion throughout sub-saharan Africa to help identify the countries that will be hit the hardest.
In medicine, statistical research may take the form of equivalence testing to compare, improve and examine the effectiveness of new drugs to aid depression. Astronomers may utilize statistical models to support research on the expansion of the universe, while an actuary may look for statistical models to predict risk of financial investments or business expansion.
Mechanics and automotive industrialists can apply statistics to constantly improve the quality of their product by constantly minimizing the level of errors in the performance of their product. Perhaps a more familiar example is the collation of government statistics. For years, governments have gathered a wealth of enormous datasets and utilized the power of statistics to inform decisions and research improvements on housing, income, unemployment, minimum wage, healthcare, and education services.
Data collection and plant identification. By pre-emptively identifying the statistical test s you want to employ to help answer your research question s , hopefully you know what sort of data needs to be collected.
Where statistics comes in handy is helping you identify key aspects you may not have considered in your chosen methods of data collection. Such may come in the form of identifying an additional variable of importance to collect data on.
Another pitfall statistics can help you avoid is that of pseudoreplication. Sample sizes are important as they determine the power of your statistical tests and therefore the confidence and scope of your conclusions based on the statistical results. Secondly it fails to highlight that some variables may not be independent.
This may mask the true effects of the variables that you wish to be examining independently. Sampling bias can also be avoided when considering the statistical test you hope to use: for example research on the occurrence of domestic violence in households should investigate low-income, middle-income, and high-income neighbourhoods. Without statistical tests there would be no objective way to show whether the data are in support or in disagreement of research questions.
In industry, statisticians design and analyze experiments to improve the safety, reliability and performance of products of all types, ranging from ballpoint pens to home appliances to automobiles. Other industrial settings include the food industry where statistics is used to design tastier, more attractive and more nutritious products. Statisticians are also directly involved with quality control issues in manufacturing to ensure consistent product dependability.
Statisticians work with social scientists to survey attitudes and opinions. They explore differences in viewpoints and in opportunities for persons with varying cultural, racial and economic backgrounds. In education, statisticians are involved with the assessment of educational aptitude and achievement and with experiments designed to measure the effectiveness of curricular innovations.
0コメント