Unit rationale, description and aim

To effectively work with datasets, data scientists need be able to apply established techniques of statistical analysis to the information they work with. Statistical modelling is the application of statistical analysis techniques to datasets. It is a mathematical representation of observed data, allowing relationships between data to be identified, predictions about future sets of data made, and visualization of data to aid understanding. Statistical modelling techniques fall into two groups; supervised learning includes regression and classification models; unsupervised learning includes clustering algorithms and association rules. By exploring case studies and industry-relevant examples, students will have the opportunity of gaining an in-depth understanding of the range and application of both supervised and unsupervised statistical data modelling techniques.

The aim of this unit is to facilitate the development of skills required to analyse datasets. 

2026 10

Campus offering

No unit offerings are currently available for this unit.

Prerequisites

Nil

Learning outcomes

To successfully complete this unit you will be able to demonstrate you have achieved the learning outcomes (LO) detailed in the below table.

Each outcome is informed by a number of graduate capabilities (GC) to ensure your work in this, and every unit, is part of a larger goal of graduating from ACU with the attributes of insight, empathy, imagination and impact.

Explore the graduate capabilities.

Develop and implement statistical data model solut...

Learning Outcome 01

Develop and implement statistical data model solutions to analyse and interpret complex datasets.
Relevant Graduate Capabilities: GC1, GC2, GC3, GC7, GC8

Validate and interpret the outcomes of statistical...

Learning Outcome 02

Validate and interpret the outcomes of statistical data models.
Relevant Graduate Capabilities: GC1, GC2, GC3, GC7, GC8

Critically evaluate the assumptions and limitation...

Learning Outcome 03

Critically evaluate the assumptions and limitations of different statistical models.
Relevant Graduate Capabilities: GC1, GC2, GC3, GC7, GC8

Communicate statistical findings effectively to bo...

Learning Outcome 04

Communicate statistical findings effectively to both technical and non-technical audiences.
Relevant Graduate Capabilities: GC1, GC2, GC3, GC11

Content

Topics will include:

  • Foundational Concepts of Statistical Modelling
  • Linear Regression Models and Diagnostics
  • Multivariate Analysis
  • Regularisation Methods
  • Model Selection and Estimation
  • Spatial Modelling
  • Classification Methods
  • Generalised Linear Models
  • Resampling Methods
  • Clustering
  • Non-linear Dimensionality Reduction
  • Applications of Statistical Modelling

Assessment strategy and rationale

The assessment is designed to ensure that students gain the ability to develop statistical data models that are appropriate to the data being used and hypotheses being explored. Assessment 1 is an opportunity to explore inferential statistical approaches and developing research hypotheses appropriate to the data being explored. Assessment 2 builds on assessment, exploring approaches to linear and non-linear data, looking at the benefits and limitations of these different approaches on a given data set. This assessment allows for an investigation into model fitting, quality, bias and variation. Assessment 3 requires students to explain and justify the outcomes of the models developed in assessment 1 and 2. These assessments scaffolds students’ learning during the unit and provides necessary foundations for their data science project.

To pass the unit, students must demonstrate achievement of every unit learning outcome and obtain a minimum mark of 50%

Overview of assessments

Assessment Task 1: Hypothesis generation Studen...

Assessment Task 1: Hypothesis generation

Students will formulate a hypothesis, select and analyse a real-world dataset using regression modelling, and critically evaluate model assumptions and results.

Weighting

30%

Learning Outcomes LO1, LO2, LO3
Graduate Capabilities GC1, GC2, GC3, GC7, GC8

Assessment Task 2: Case Study Given a real-world...

Assessment Task 2: Case Study

Given a real-world data set, using various modelling techniques, students will explore approaches to linear and non-linear data, and the trade-off between bias and variation. 

Weighting

40%

Learning Outcomes LO1, LO2, LO3
Graduate Capabilities GC1, GC2, GC3, GC7, GC8

Assessment Task 3: Report Students will select a...

Assessment Task 3: Report

Students will select a peer-reviewed scientific paper that applies modelling techniques covered in Assessments Tasks 1 and 2, and critically evaluate the methodology, interpretation and outcomes of the study.

Weighting

30%

Learning Outcomes LO3, LO4
Graduate Capabilities GC1, GC2, GC3, GC7, GC8, GC11

Learning and teaching strategy and rationale

The teaching approach within this unit puts the student at the centre of their learning. This is achieved through the integration of interactive learning elements that facilitate problem-solving. Access to fundamental knowledge is provided through engaging resources that enable students to build their understanding in a flexible manner. Students are given the opportunity to extend this knowledge through social learning opportunities that support deeper engagement. These experiences enable students to develop more complex understandings through peer interactions and structured learning activities. This approach allows students to build problem-solving skills that align with vocational practices in computer science.

Representative texts and references

Representative texts and references

Bruce, P., Bruce, A. & Gedeck. P. (2020). Practical statistics for data scientists: 50+ essential concepts using R and Python. O'Reilly Media.

Denis, D.J. (2020). Univariate, bivariate, and multivariate statistics using R: quantitative tools for data analysis and data science. John Wiley & Sons.

Fan, J., Li, R., Zhang, C.H. and Zou, H. (2020). Statistical foundations of data science. Chapman and Hall/CRC.

James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). Statistical Learning: In An Introduction to Statistical Learning: With applications in Python (pp.15-67). Springer International Publishing.

Kim, J.K. and Shao, J. (2021). Statistical methods for handling incomplete data. Chapman and Hall/CRC.

Maaten, L.V.D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research. 9(Nov). 2579-2605.

McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv: 1802.03426.

Pebesma, E., & Bivand, R. (2023). Spatial data science: With applications in R. Chapman and Hall/CRC

Rao, S.J. (2003). Regression Modelling Strategies: With applications to linear models, logistic regression and survival analysis.

Thulin, M. (2024). Modern Statistics with R: from wrangling and exploring data to inference and predictive modelling. CRC Press.

Locations
Credit points
Year

Have a question?

We're available 9am–5pm AEDT,
Monday to Friday

If you’ve got a question, our AskACU team has you covered. You can search FAQs, text us, email, live chat, call – whatever works for you.

Live chat with us now

Chat to our team for real-time
answers to your questions.

Launch live chat

Visit our FAQs page

Find answers to some commonly
asked questions.

See our FAQs