One limitation to R’s reference classes is that class inheritance
across package namespaces is limited. R6 avoids this problem when the
portable
option is enabled.
The problem
Here is an example of the cross-package inheritance problem with
reference classes: Suppose you have ClassA in pkgA, and ClassB in pkgB,
which inherits from ClassA. ClassA has a method foo
which
calls a non-exported function fun
in pkgA.
If ClassB inherits foo
, it will try to call
fun
– but since ClassB objects are created in pkgB
namespace (which is an environment) instead of the pkgA namespace, it
won’t be able to find fun
.
Something similar happens with R6 when the
portable=FALSE
option is used. For example:
library(R6)
# Simulate packages by creating environments
pkgA <- new.env()
pkgB <- new.env()
# Create a function in pkgA but not pkgB
pkgA$fun <- function() 10
ClassA <- R6Class("ClassA",
portable = FALSE,
public = list(
foo = function() fun()
),
parent_env = pkgA
)
# ClassB inherits from ClassA
ClassB <- R6Class("ClassB",
portable = FALSE,
inherit = ClassA,
parent_env = pkgB
)
When we create an instance of ClassA, it works as expected:
a <- ClassA$new()
a$foo()
#> [1] 10
But with ClassB, it can’t find the foo
function:
b <- ClassB$new()
b$foo()
#> Error in b$foo() : could not find function "fun"
Portable R6
R6 supports inheritance across different packages, with the default
portable=TRUE
option. In this example, we’ll again simulate
different packages by creating separate parent environments for the
classes.
pkgA <- new.env()
pkgB <- new.env()
pkgA$fun <- function() {
"This function `fun` in pkgA"
}
ClassA <- R6Class("ClassA",
portable = TRUE, # The default
public = list(
foo = function() fun()
),
parent_env = pkgA
)
ClassB <- R6Class("ClassB",
portable = TRUE,
inherit = ClassA,
parent_env = pkgB
)
a <- ClassA$new()
a$foo()
#> [1] "This function `fun` in pkgA"
b <- ClassB$new()
b$foo()
#> [1] "This function `fun` in pkgA"
When a method is inherited from a superclass, that method also gets that class’s environment. In other words, method “runs in” the superclass’s environment. This makes it possible for inheritance to work across packages.
When a method is defined in the subclass, that method gets the
subclass’s environment. For example, here ClassC is a subclass of
ClassA, and defines its own foo
method which overrides the
foo
method from ClassA. It happens that the method looks
the same as ClassA’s – it just calls fun
. But this time it
finds pkgC$fun
instead of pkgA$fun
. This is in
contrast to ClassB, which inherited the foo
method and
environment from ClassA.
pkgC <- new.env()
pkgC$fun <- function() {
"This function `fun` in pkgC"
}
ClassC <- R6Class("ClassC",
portable = TRUE,
inherit = ClassA,
public = list(
foo = function() fun()
),
parent_env = pkgC
)
cc <- ClassC$new()
# This method is defined in ClassC, so finds pkgC$fun
cc$foo()
#> [1] "This function `fun` in pkgC"
Using self
One important difference between non-portable and portable classes is
that with non-portable classes, it’s possible to access members with
just the name of the member, and with portable classes, member access
always requires using self$
or private$
. This
is a consequence of the inheritance implementation.
Here’s an example of a non-portable class with two methods:
sety
, which sets the private field y
using the
<<-
operator, and getxy
, which returns a
vector with the values of fields x
and y
:
NP <- R6Class("NP",
portable = FALSE,
public = list(
x = 1,
getxy = function() c(x, y),
sety = function(value) y <<- value
),
private = list(
y = NA
)
)
np <- NP$new()
np$sety(20)
np$getxy()
#> [1] 1 20
If we attempt the same with a portable class, it results in an error:
P <- R6Class("P",
portable = TRUE,
public = list(
x = 1,
getxy = function() c(x, y),
sety = function(value) y <<- value
),
private = list(
y = NA
)
)
p <- P$new()
# No error, but instead of setting private$y, this sets y in the global
# environment! This is because of the semantics of <<-.
p$sety(20)
y
#> [1] 20
p$getxy()
#> Error in p$getxy() : object 'y' not found
To make this work with a portable class, we need to use
self$x
and private$y
:
P2 <- R6Class("P2",
portable = TRUE,
public = list(
x = 1,
getxy = function() c(self$x, private$y),
sety = function(value) private$y <- value
),
private = list(
y = NA
)
)
p2 <- P2$new()
p2$sety(20)
p2$getxy()
#> [1] 1 20
There is a small performance penalty for using self$x
as
opposed to x
. In most cases, this is negligible, but it can
be noticeable in some situations where there are tens of thousands or
more accesses per second. For more information, see
vignette("Performance")
.
Potential pitfalls with cross-package inheritance
Inheritance happens when an object is instantiated with
MyClass$new()
. At that time, members from the superclass
get copied to the new object. This means that when you instantiate R6
object, it will essentially save some pieces of the superclass in the
object.
Because of the way that packages are built in R, R6’s inheritance behavior could potentially lead to surprising, hard-to-diagnose problems when packages change versions.
Suppose you have two packages, pkgA, containing ClassA
,
and pkgB, containing ClassB
, and there is code in pkgB that
instantiates ClassB
in an object, objB
, at
build time. This is in contrast to instantiating ClassB
at
run-time, by calling a function. All the code in the package is run when
a binary package is built, and the resulting objects are saved
in the package. (Generally, if the object can be accessed with
pkgB:::objB
, this means it was created at build time.)
When objB
is created at package build time, pieces from
the superclass, pkgA::ClassA
, are saved inside it. This is
fine in and of itself. But imagine that pkgB was built and installed
against pkgA 1.0, and then you upgrade to pkgA 2.0 without subsequently
building and installing pkgB. Then pkgB::objB
will contain
some code from pkgA::ClassA
1.0, but the version of
pkgA::ClassA
that’s installed will be 2.0. This can cause
problems if objB
inherited code which uses parts of
pkgA
that have changed – but the problems may not be
entirely obvious.
This scenario is entirely possible when installing packages from CRAN. It is very common for a package to be upgraded without upgrading all of its downstream dependencies. As far as I know, R does not have any mechanism to force downstream dependencies to be rebuilt when a package is upgraded on a user’s computer.
If this problem happens, the remedy is to rebuild pkgB against pkgA
2.0. I don’t know if CRAN rebuilds all downstream dependencies when a
package is updated. If it doesn’t, then it’s possible for CRAN to have
incompatible binary builds of pkgA and pkgB, and users would then have
to install pkgB from source, with
install.packages("pkgB", type = "source")
.
To avoid this problem entirely, objects of ClassB
must
not be instantiated at build time. You can either (A) instantiate them
only in functions, or (B) instantiate them at package load time, by
adding an .onLoad
function to your package. For
example:
ClassB <- R6Class("ClassB",
inherit = pkgA::ClassA,
public = list(x = 1)
)
# We'll fill this at load time
objB <- NULL
.onLoad <- function(libname, pkgname) {
# The namespace is locked after loading; we can still modify objB at this time.
objB <<- ClassB$new()
}
You might be wondering why ClassB
(the class, not the
instance of the class objB
) doesn’t save a copy of
pkgA::ClassA
inside of it when the package is built. This
is because, for the inherit
argument, R6Class
saves the unevaluated expression (pkgA::ClassA
), and
evaluates it when $new()
is called.