Home

PySpark is a powerful open-source framework for distributed computing that allows developers and data scientist’s to process large amounts of data in parallel across a cluster of computers. PySpark is built on top of Apache Spark, a big data processing engine, and provides a simple and user-friendly interface for working with large datasets usin...

In R Functions - Part 1 we looked at creating functions, positional and named arguments, returns, and default values. We also considered the function environment. This provides all the basic knowledge to start using functions in your code.

A function is a ‘chunk’ of code that can be called and re-used. It often accepts ‘arguments’ that can be used to modify the behaviour or value which the function returns.

It’s easy to end up with a block of code that is overly verbose. We’ve all been there, an initial requirement needed two lines of code to handle some simple logic; a little later another line was added to cover an edge case, and before you know it you have 20 lines of copy pasta. Finding the time to refactor these blocks of code is nearly alway...

Lists are a little different to other data structures you typically encounter in R. They can contain elements of different types, including other lists, and offer a large amount of flexibility.

Loops are a fundamental element of computer programming. They facilitate iteration by running a piece of code repeatedly and provide a flexible approach to solving many problems that you may encounter. Loops have gained something of a bad reputation amongst the R community, largely owing to the fact that poor implementation of loops in R can re...

In statistics we often talk about categorical variables. These are variables with a typically limited and potentially fixed number of possible values. An observation of a categorical variable is based on a qualitative property.

Data often contains information pertaining to a multitude of groups. Producing summary or aggregated statistics for these groups is a common task. Being able to efficiently perform grouping operations is a powerful tool. There are many ways to produce summary statistics and aggregations using R, however, the one of most intuitive ways to achiev...

PySpark - Basics

R Functions - Part 2

R Functions - Part 1

Refactoring - Writing readable code

R Basics - Lists

R Basics - Loops

R Basics - Factors

R Basics - Grouping