Package 'tpc' reference manual

Title:	Tiered PC Algorithm
Description:	Constraint-based causal discovery using the PC algorithm while accounting for a partial node ordering, for example a partial temporal ordering when the data were collected in different waves of a cohort study. Andrews RM, Foraita R, Didelez V, Witte J (2021) <arXiv:2108.13395> provide a guide how to use tpc to analyse cohort data.
Authors:	Janine Witte [aut], Ronja Foraita [cre, ctb] , DFG [fnd]
Maintainer:	Ronja Foraita <[email protected]>
License:	GPL (>= 3)
Version:	1.0
Built:	2025-02-24 05:50:37 UTC
Source:	https://github.com/bips-hb/tpc

Simulated Cohort Data

Description

Simulated data based on 'true_sim' of a European child-and-youth cohort study with three waves (t0, t1 and t2). See Andrews et al. (2021) <https://arxiv.org/abs/2108.13395> for more information on how the data were generated.

Usage

dat_cohort
dat_cohort

Format

A data frame with 5000 observations and 34 variables (10 variables were measured at three time points each, denoted as "_t0", "_t1" and "_t2").

sex: Sex. Factor variable with levels "male" and "female".
country: Country of residence. Factor variable with levels "ITA", "EST", "CYP", "BEL", "SWE", "GER", "HUN" and "ESP".
fto: Genotype of one SNP located in the FTO gene. Factor variable with levels "TT", "AT", "AA".
birth_weight: Birth weight in grams (numeric).
age_t0: Age in years at survey 't0' (numeric).
age_t1: Age in years at survey 't1' (numeric).
age_t2: Age in years at survey 't2' (numeric).
bmi_t0: Body mass index z-score adjusted for sex and age at survey 't0' (numeric).
bmi_t1: Body mass index z-score adjusted for sex and age at survey 't1' (numeric).
bmi_t2: Body mass index z-score adjusted for sex and age at survey 't2' (numeric).
bodyfat_t0: Per cent body fat measured at survey 't0' (numeric).
bodyfat_t1: Per cent body fat measured at survey 't1' (numeric).
bodyfat_t2: Per cent body fat measured at survey 't2' (numeric).
education_t0: Educational level at survey 't0'. Factor variable with levels "low education", "medium education" and "high education".
education_t1: Educational level at survey 't1'. Factor variable with levels "low education", "medium education" and "high education".
education_t2: Educational level at survey 't2'. Factor variable with levels "low education", "medium education" and "high education".
fiber_t0: Fiber intake in log(mg/kcal) at survey 't0' (numeric).
fiber_t1: Fiber intake in log(mg/kcal) at survey 't1' (numeric).
fiber_t2: Fiber intake in log(mg/kcal) at survey 't2' (numeric).
media_devices_t0: Number of audiovisual media in the child's bedroom at survey 't0' (numeric).
media_devices_t1: Number of audiovisual media in the child's bedroom at survey 't1' (numeric).
media_devices_t2: Number of audiovisual media in the child's bedroom at survey 't2' (numeric).
media_time_t0: Use of audiovisual media in log(h/week+1) at survey 't0' (numeric)
media_time_t1: Use of audiovisual media in log(h/week+1) at survey 't1' (numeric)
media_time_t2: Use of audiovisual media in log(h/week+1) at survey 't2' (numeric)
mvpa_t0: Moderate to vigorous physical activity in sqrt(min/day) at survey 't0' (numeric).
mvpa_t1: Moderate to vigorous physical activity in sqrt(min/day) at survey 't1' (numeric).
mvpa_t2: Moderate to vigorous physical activity in sqrt(min/day) at survey 't2' (numeric).
sugar_t0: Square root of sugar intake score at survey 't0' (numeric).
sugar_t1: Square root of sugar intake score at survey 't1' (numeric).
sugar_t2: Square root of sugar intake score at survey 't2' (numeric).
wellbeing_t0: Box-Cox-transformed well-being score at survey 't0' (numeric).
wellbeing_t1: Box-Cox-transformed well-being score at survey 't1' (numeric).
wellbeing_t2: Box-Cox-transformed well-being score at survey 't2' (numeric).

References

Andrews RM, Foraita R, Witte J (2021). A practical guide to causal discovery with cohort data. <https://doi.org/10.48550/arXiv.2108.13395>

Simulated Cohort Data - discretized

Description

Data from dat_cohort for which all continuous variables have been categorized into three categories.

Usage

dat_cohort_dis
dat_cohort_dis

Format

A data frame with 5000 observations and 34 variables (10 variables were measured at three time points each, denoted as "_t0", "_t1" and "_t2").

sex: Sex. Factor variable with levels "male" and "female".
country: Country of residence. Factor variable with levels "ITA", "EST", "CYP", "BEL", "SWE", "GER", "HUN" and "ESP".
fto: Genotype of one SNP located in the FTO gene. Factor variable with levels "TT", "AT", "AA".
birth_weight: Birth weight in grams (numeric).
age_t0: Age in years at survey 't0' (numeric).
age_t1: Age in years at survey 't1' (numeric).
age_t2: Age in years at survey 't2' (numeric).
bmi_t0: Body mass index z-score adjusted for sex and age at survey 't0' (numeric).
bmi_t1: Body mass index z-score adjusted for sex and age at survey 't1' (numeric).
bmi_t2: Body mass index z-score adjusted for sex and age at survey 't2' (numeric).
bodyfat_t0: Per cent body fat measured at survey 't0' (numeric).
bodyfat_t1: Per cent body fat measured at survey 't1' (numeric).
bodyfat_t2: Per cent body fat measured at survey 't2' (numeric).
education_t0: Educational level at survey 't0'. Factor variable with levels "low education", "medium education" and "high education".
education_t1: Educational level at survey 't1'. Factor variable with levels "low education", "medium education" and "high education".
education_t2: Educational level at survey 't2'. Factor variable with levels "low education", "medium education" and "high education".
fiber_t0: Fiber intake in log(mg/kcal) at survey 't0' (numeric).
fiber_t1: Fiber intake in log(mg/kcal) at survey 't1' (numeric).
fiber_t2: Fiber intake in log(mg/kcal) at survey 't2' (numeric).
media_devices_t0: Number of audiovisual media in the child's bedroom at survey 't0' (numeric).
media_devices_t1: Number of audiovisual media in the child's bedroom at survey 't1' (numeric).
media_devices_t2: Number of audiovisual media in the child's bedroom at survey 't2' (numeric).
media_time_t0: Use of audiovisual media in log(h/week+1) at survey 't0' (numeric)
media_time_t1: Use of audiovisual media in log(h/week+1) at survey 't1' (numeric)
media_time_t2: Use of audiovisual media in log(h/week+1) at survey 't2' (numeric)
mvpa_t0: Moderate to vigorous physical activity in sqrt(min/day) at survey 't0' (numeric).
mvpa_t1: Moderate to vigorous physical activity in sqrt(min/day) at survey 't1' (numeric).
mvpa_t2: Moderate to vigorous physical activity in sqrt(min/day) at survey 't2' (numeric).
sugar_t0: Square root of sugar intake score at survey 't0' (numeric).
sugar_t1: Square root of sugar intake score at survey 't1' (numeric).
sugar_t2: Square root of sugar intake score at survey 't2' (numeric).
wellbeing_t0: Box-Cox-transformed well-being score at survey 't0' (numeric).
wellbeing_t1: Box-Cox-transformed well-being score at survey 't1' (numeric).
wellbeing_t2: Box-Cox-transformed well-being score at survey 't2' (numeric).

References

Andrews RM, Foraita R, Witte J (2021). A practical guide to causal discovery with cohort data. <https://doi.org/10.48550/arXiv.2108.13395>

Simulated Cohort Data - with missing values

Description

Data from dat_cohort with missing values.

Usage

dat_cohort_mis
dat_cohort_mis

Format

A data frame with 5000 observations and 34 variables (10 variables were measured at three time points each, denoted as "_t0", "_t1" and "_t2").

sex: Sex. Factor variable with levels "male" and "female".
country: Country of residence. Factor variable with levels "ITA", "EST", "CYP", "BEL", "SWE", "GER", "HUN" and "ESP".
fto: Genotype of one SNP located in the FTO gene. Ordinal variable with levels "TT", "AT", "AA".
birth_weight: Birth weight in grams (numeric).
age_t0: Age in years at survey 't0' (numeric).
age_t1: Age in years at survey 't1' (numeric).
age_t2: Age in years at survey 't2' (numeric).
bmi_t0: Body mass index z-score adjusted for sex and age at survey 't0' (numeric).
bmi_t1: Body mass index z-score adjusted for sex and age at survey 't1' (numeric).
bmi_t2: Body mass index z-score adjusted for sex and age at survey 't2' (numeric).
bodyfat_t0: Per cent body fat measured at survey 't0' (numeric).
bodyfat_t1: Per cent body fat measured at survey 't1' (numeric).
bodyfat_t2: Per cent body fat measured at survey 't2' (numeric).
education_t0: Educational level at survey 't0'. Ordinal variable with levels "low education", "medium education" and "high education".
education_t1: Educational level at survey 't1'. Ordinal variable with levels "low education", "medium education" and "high education".
education_t2: Educational level at survey 't2'. Ordinal variable with levels "low education", "medium education" and "high education".
fiber_t0: Fiber intake in log(mg/kcal) at survey 't0' (numeric).
fiber_t1: Fiber intake in log(mg/kcal) at survey 't1' (numeric).
fiber_t2: Fiber intake in log(mg/kcal) at survey 't2' (numeric).
media_devices_t0: Number of audiovisual media in the child's bedroom at survey 't0' (numeric).
media_devices_t1: Number of audiovisual media in the child's bedroom at survey 't1' (numeric).
media_devices_t2: Number of audiovisual media in the child's bedroom at survey 't2' (numeric).
media_time_t0: Use of audiovisual media in log(h/week+1) at survey 't0' (numeric)
media_time_t1: Use of audiovisual media in log(h/week+1) at survey 't1' (numeric)
media_time_t2: Use of audiovisual media in log(h/week+1) at survey 't2' (numeric)
mvpa_t0: Moderate to vigorous physical activity in sqrt(min/day) at survey 't0' (numeric).
mvpa_t1: Moderate to vigorous physical activity in sqrt(min/day) at survey 't1' (numeric).
mvpa_t2: Moderate to vigorous physical activity in sqrt(min/day) at survey 't2' (numeric).
sugar_t0: Square root of sugar intake score at survey 't0' (numeric).
sugar_t1: Square root of sugar intake score at survey 't1' (numeric).
sugar_t2: Square root of sugar intake score at survey 't2' (numeric).
wellbeing_t0: Box-Cox-transformed well-being score at survey 't0' (numeric).
wellbeing_t1: Box-Cox-transformed well-being score at survey 't1' (numeric).
wellbeing_t2: Box-Cox-transformed well-being score at survey 't2' (numeric).

References

Andrews RM, Foraita R, Witte J (2021). A practical guide to causal discovery with cohort data. <https://doi.org/10.48550/arXiv.2108.13395>

Simulated Data with a Partial Ordering

Description

A simple graph and corresponding dataset used in the examples illustrating tpc.

Usage

dat_sim
dat_sim

Format

A data frame with 1000 observations and 9 numerical variables simulated by drawing from a multivariate distribution according to the DAG true_sim.

A1: numeric
B1: numeric
C1: numeric
A2: numeric
B2: numeric
C2: numeric
A3: numeric
B3: numeric
C3: numeric

Last Step of tPC Algorithm: Apply Meek's rules

Description

This is a modified version of pcalg::udag2pdagRelaxed. It applies Meek's rules to the partially oriented graph obtained after orienting edges between time points / tiers.

Usage

MeekRules(
  gInput,
  verbose = FALSE,
  unfVect = NULL,
  solve.confl = FALSE,
  rules = rep(TRUE, 4)
)
MeekRules(
  gInput,
  verbose = FALSE,
  unfVect = NULL,
  solve.confl = FALSE,
  rules = rep(TRUE, 4)
)

Arguments

`gInput`	'pcAlgo'-object containing skeleton and conditional indepedence information.
`verbose`	FALSE: No output; TRUE: Details
`unfVect`	Vector containing numbers that encode ambiguous triples (as returned by [tpc_cons_intern()]. This is needed in the conservative and majority rule PC algorithms.
`solve.confl`	If `TRUE`, the orientation rules work with lists for candidate sets and allow bi-directed edges to resolve conflicting edge orientations. Note that therefore the resulting object is order-independent but might not be a PDAG because bi-directed edges can be present.
`rules`	A vector of length 4 containing `TRUE` or `FALSE` for each rule. `TRUE` in position i means that rule i (Ri) will be applied. By default, all rules are used.

Details

If unfVect = NULL (no ambiguous triples), the four orientation rules are applied to each eligible structure until no more edges can be oriented. Otherwise, unfVect contains the numbers of all ambiguous triples in the graph as determined by [tpc_cons_intern()]. Then the orientation rules take this information into account. For example, if a -> b - c and <a,b,c> is an unambigous triple and a non-v-structure, then rule 1 implies b -> c. On the other hand, if a -> b - c but <a,b,c> is an ambiguous triple, then the edge b - c is not oriented.

If solve.confl = FALSE, earlier edge orientations are overwritten by later ones.

If solv.confl = TRUE, both the v-structures and the orientation rules work with lists for the candidate edges and allow bi-directed edges if there are conflicting orientations. For example, two v-structures a -> b <- c and b -> c <- d then yield a -> b <-> c <- d. This option can be used to get an order-independent version of the PC algorithm (see Colombo and Maathuis (2014)).

We denote bi-directed edges, for example between two variables i and j, in the adjacency matrix M of the graph as M[i,j]=2 and M[j,i]=2. Such edges should be interpreted as indications of conflicts in the algorithm, for example due to errors in the conditional independence tests or violations of the faithfulness assumption.

Value

An object of class pcAlgo-class.

Author(s)

Original code by Markus Kalisch, modifications by Janine Witte.

References

C. Meek (1995). Causal inference and causal explanation with background knowledge. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 403-411. Morgan Kaufmann Publishers.

D. Colombo and M.H. Maathuis (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research 15:3741-3782.

Examples

data(dat_sim)
sk.fit <- skeleton(suffStat = list(C = cor(dat_sim), n = nrow(dat_sim)),
             indepTest = gaussCItest, labels = names(dat_sim), alpha = 0.05)
MeekRules(sk.fit)

data(dat_sim)
sk.fit <- skeleton(suffStat = list(C = cor(dat_sim), n = nrow(dat_sim)),
             indepTest = gaussCItest, labels = names(dat_sim), alpha = 0.05)
MeekRules(sk.fit)

PC Algorithm Accounting for a Partial Node Ordering

Description

Like [pcalg::pc()], but takes into account a user-specified partial ordering of the nodes/variables. This has two effects: 1) The conditional independence between x and y given S is ot tested if any variable in S lies in the future of both x and y; 2) edges cannot be oriented from a higher-order to a lower-order node. In addition, the user may specify individual forbidden edges and context variables.

Usage

tpc(
  suffStat,
  indepTest,
  alpha,
  labels,
  p,
  skel.method = c("stable", "stable.parallel"),
  forbEdges = NULL,
  m.max = Inf,
  conservative = FALSE,
  maj.rule = TRUE,
  tiers = NULL,
  context.all = NULL,
  context.tier = NULL,
  verbose = FALSE,
  numCores = NULL,
  cl.type = "PSOCK",
  clusterexport = NULL
)
tpc(
  suffStat,
  indepTest,
  alpha,
  labels,
  p,
  skel.method = c("stable", "stable.parallel"),
  forbEdges = NULL,
  m.max = Inf,
  conservative = FALSE,
  maj.rule = TRUE,
  tiers = NULL,
  context.all = NULL,
  context.tier = NULL,
  verbose = FALSE,
  numCores = NULL,
  cl.type = "PSOCK",
  clusterexport = NULL
)

Arguments

`suffStat`	A [base::list()] of sufficient statistics, containing all necessary elements for the conditional independence decisions in the function [indepTest()].
`indepTest`	A function for testing conditional independence. It is internally called as `indepTest(x,y,S,suffStat)`, and tests conditional independence of `x` and `y` given `S`. Here, `x` and `y` are variables, and `S` is a (possibly empty) vector of variables (all variables are denoted by their (integer) column positions in the adjacency matrix). `suffStat` is a list, see the argument above. The return value of `indepTest` is the p-value of the test for conditional independence.
`alpha`	significance level (number in (0,1) for the individual conditional independence tests.
`labels`	(optional) character vector of variable (or "node") names. Typically preferred to specifying `p`.
`p`	(optional) number of variables (or nodes). May be specified if `labels` are not, in which case `labels` is set to `1:p`.
`skel.method`	Character string specifying method; the default, "stable" provides an order-independent skeleton, see [tpc::tskeleton()].
`forbEdges`	A logical matrix of dimension p*p. If `[i,j]` is TRUE, then the directed edge i->j is forbidden. If both `[i,j]` and `[j,i]` are TRUE, then any type of edge between i and j is forbidden.
`m.max`	Maximal size of the conditioning sets that are considered in the conditional independence tests.
`conservative`	Logical indicating if conservative PC should be used. Defaults to FALSE. See [pcalg::pc()] for details.
`maj.rule`	Logical indicating if the majority rule should be used. Defaults to TRUE. See [pcalg::pc()] for details.
`tiers`	Numeric vector specifying the tier / time point for each variable. Must be of length 'p', if specified, or have the same length as 'labels', if specified. A smaller number corresponds to an earlier tier / time point.
`context.all`	Numeric or character vector. Specifies the positions or names of global context variables. Global context variables have no incoming edges, i.e. no parents, and are themselves parents of all non-context variables in the graph.
`context.tier`	Numeric or character vector. Specifies the positions or names of tier-specific context variables. Tier-specific context variables have no incoming edges, i.e. no parents, and are themselves parents of all non-context variables in the same tier.
`verbose`	if `TRUE`, detailed output is provided.
`numCores`	The numbers of CPU cores to be used.
`cl.type`	The cluster type. Default value is `"PSOCK"`. For High-performance clusters use `"MPI"`. See also `parallel::makeCluster`.
`clusterexport`	Character vector. Lists functions to be exported to nodes if numCores > 1.

Details

See pcalg::pc for further information on the PC algorithm. The PC algorithm is named after its developers Peter Spirtes and Clark Glymour (Spirtes et al., 2000).

Specifying a tier for each variable using the tier argument has the following effects: 1) In the skeleton phase and v-structure learing phases, conditional independence testing is restricted such that if x is in tier t(x) and y is in t(y), only those variables are allowed in the conditioning set whose tier is not larger than t(x). 2) Following the v-structure phase, all edges that were found between two tiers are directed into the direction of the higher-order tier. If context variables are specified using context.all and/or context.tier, the corresponding orientations are added in this step.

Value

An object of class "pcAlgo" (see [pcalg::pcalgo] containing an estimate of the equivalence class of the underlying DAG.

Author(s)

Original code by Markus Kalisch, Martin Maechler, and Diego Colombo. Modifications by Janine Witte (Kalisch et al., 2012).

References

M. Kalisch, M. Maechler, D. Colombo, M.H. Maathuis and P. Buehlmann (2012). Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software 47(11): 1–26.

P. Spirtes, C. Glymour and R. Scheines (2000). Causation, Prediction, and Search, 2nd edition. The MIT Press. https://philarchive.org/archive/SPICPA-2.

Examples

# load simulated cohort data
data(dat_sim)
n <- nrow(dat_sim)
lab <- colnames(dat_sim)

# estimate skeleton without taking background information into account
tpc.fit <- tpc(suffStat = list(C = cor(dat_sim), n = n),
               indepTest = gaussCItest, alpha = 0.01, labels = lab)
pc.fit <- pcalg::pc(suffStat = list(C = cor(dat_sim), n = n),
                    indepTest = gaussCItest, alpha = 0.01, labels = lab,
                    maj.rule = TRUE, solve.conf = TRUE)
identical(pc.fit@graph, tpc.fit@graph) # TRUE
# estimate skeleton with temporal ordering as background information
tiers <- rep(c(1,2,3), times=c(3,3,3))
tpc.fit2 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers)

tpc.fit3 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers,
                skel.method = "stable.parallel",
                numCores = 2, clusterexport = c("cor", "ecdf"))

if(requireNamespace("Rgraphviz", quietly = TRUE)){
 data("true_sim")
 oldpar <- par(mfrow = c(1,3))
 plot(true_sim, main = "True DAG")
 plot(tpc.fit, main = "PC estimate")
 plot(tpc.fit2, main = "tPC estimate")
 par(oldpar)
 }

 # require that there is no edge between A1 and A1, and that any edge between A2 and B2
 # or A2 and C2 is directed away from A2
 forb <- matrix(FALSE, nrow=9, ncol=9)
 rownames(forb) <- colnames(forb) <- lab
 forb["A1","A3"] <- forb["A3","A1"] <- TRUE
 forb["B2","A2"] <- TRUE
 forb["C2","A2"] <- TRUE

 tpc.fit3 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                 indepTest = gaussCItest, alpha = 0.01,labels = lab,
                 forbEdges = forb, tiers = tiers)

 if (requireNamespace("Rgraphviz", quietly = TRUE)) {
 # compare estimated CPDAGs
   data("true_sim")
   oldpar <- par(mfrow = c(1,2))
   plot(tpc.fit2, main = "old tPC estimate")
   plot(tpc.fit3, main = "new tPC estimate")
   par(oldpar)
 }
 # force edge from A1 to all other nodes measured at time 1
 # into the graph (note that the edge from A1 to A2 is then
 # forbidden)
 tpc.fit4 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                 indepTest = gaussCItest, alpha = 0.01, labels = lab,
                 tiers = tiers, context.tier = "A1")

 if (requireNamespace("Rgraphviz", quietly = TRUE)) {
 # compare estimated CPDAGs
  data("true_sim")
  plot(tpc.fit4, main = "alternative tPC estimate")
 }

 # force edge from A1 to all other nodes into the graph
 tpc.fit5 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                 indepTest = gaussCItest, alpha = 0.01, labels = lab,
                 tiers = tiers, context.all = "A1")

 if (requireNamespace("Rgraphviz", quietly = TRUE)) {
 # compare estimated CPDAGs
 data("true_sim")
 plot(tpc.fit5, main = "alternative tPC estimate")
 }

# load simulated cohort data
data(dat_sim)
n <- nrow(dat_sim)
lab <- colnames(dat_sim)

# estimate skeleton without taking background information into account
tpc.fit <- tpc(suffStat = list(C = cor(dat_sim), n = n),
               indepTest = gaussCItest, alpha = 0.01, labels = lab)
pc.fit <- pcalg::pc(suffStat = list(C = cor(dat_sim), n = n),
                    indepTest = gaussCItest, alpha = 0.01, labels = lab,
                    maj.rule = TRUE, solve.conf = TRUE)
identical(pc.fit@graph, tpc.fit@graph) # TRUE
# estimate skeleton with temporal ordering as background information
tiers <- rep(c(1,2,3), times=c(3,3,3))
tpc.fit2 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers)

tpc.fit3 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers,
                skel.method = "stable.parallel",
                numCores = 2, clusterexport = c("cor", "ecdf"))

if(requireNamespace("Rgraphviz", quietly = TRUE)){
 data("true_sim")
 oldpar <- par(mfrow = c(1,3))
 plot(true_sim, main = "True DAG")
 plot(tpc.fit, main = "PC estimate")
 plot(tpc.fit2, main = "tPC estimate")
 par(oldpar)
 }

 # require that there is no edge between A1 and A1, and that any edge between A2 and B2
 # or A2 and C2 is directed away from A2
 forb <- matrix(FALSE, nrow=9, ncol=9)
 rownames(forb) <- colnames(forb) <- lab
 forb["A1","A3"] <- forb["A3","A1"] <- TRUE
 forb["B2","A2"] <- TRUE
 forb["C2","A2"] <- TRUE

 tpc.fit3 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                 indepTest = gaussCItest, alpha = 0.01,labels = lab,
                 forbEdges = forb, tiers = tiers)

 if (requireNamespace("Rgraphviz", quietly = TRUE)) {
 # compare estimated CPDAGs
   data("true_sim")
   oldpar <- par(mfrow = c(1,2))
   plot(tpc.fit2, main = "old tPC estimate")
   plot(tpc.fit3, main = "new tPC estimate")
   par(oldpar)
 }
 # force edge from A1 to all other nodes measured at time 1
 # into the graph (note that the edge from A1 to A2 is then
 # forbidden)
 tpc.fit4 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                 indepTest = gaussCItest, alpha = 0.01, labels = lab,
                 tiers = tiers, context.tier = "A1")

 if (requireNamespace("Rgraphviz", quietly = TRUE)) {
 # compare estimated CPDAGs
  data("true_sim")
  plot(tpc.fit4, main = "alternative tPC estimate")
 }

 # force edge from A1 to all other nodes into the graph
 tpc.fit5 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
                 indepTest = gaussCItest, alpha = 0.01, labels = lab,
                 tiers = tiers, context.all = "A1")

 if (requireNamespace("Rgraphviz", quietly = TRUE)) {
 # compare estimated CPDAGs
 data("true_sim")
 plot(tpc.fit5, main = "alternative tPC estimate")
 }

Utility for Conservative and Majority Rule in tpc

Description

Like pcalg::pc.cons.intern, but takes into account the user-specified partial node/variable ordering.

Usage

tpc.cons.intern(
  sk,
  suffStat,
  indepTest,
  alpha,
  version.unf = c(NA, NA),
  maj.rule = FALSE,
  forbEdges = NULL,
  tiers = NULL,
  context.all = NULL,
  context.tier = NULL,
  verbose = FALSE
)
tpc.cons.intern(
  sk,
  suffStat,
  indepTest,
  alpha,
  version.unf = c(NA, NA),
  maj.rule = FALSE,
  forbEdges = NULL,
  tiers = NULL,
  context.all = NULL,
  context.tier = NULL,
  verbose = FALSE
)

Arguments

`sk`	A skeleton object as returned from `pcalg::skeleton`.
`suffStat`	Sufficient statistic: List containing all relevant elements for the conditional independence decisions.
`indepTest`	Pre-defined `function` for testing conditional independence. The function is internally called as `indepTest(x,y,S,suffStat)`, and tests conditional independence of `x` and `y` given `S`. Here, `x` and `y` are variables, and `S` is a (possibly empty) vector of variables (all variables are denoted by their (integer) column positions in the adjacency matrix). The return value of `indepTest` is the p-value of the test for conditional independence.
`alpha`	Significance level for the individual conditional independence tests.
`version.unf`	Vector of length two. If `version.unf[2]==1`, the inititial separating set found by the PC/FCI algorithm is added to the set of separating sets; if `version.unf[2]==2`, it is not added. In the latter case, if the set of separating sets is empty, the triple is marked as unambiguous if `version.unf[1]==1`, and as ambiguous if `version.unf[1]==2`.
`maj.rule`	Logical indicating if the triples are checked for ambiguity using the majority rule idea, which is less strict than the standard conservative method.
`forbEdges`	A logical matrix of dimension `p*p`. If `[i,j]` is TRUE, then the directed edge `i -> j` is forbidden. If both `[i,j]` and `[j,i]` are TRUE, then any type of edge between `i` and `j` is forbidden.
`tiers`	Numeric vector specifying the tier / time point for each variable. A smaller number corresponds to an earlier tier / time point.
`context.all`	Numeric or character vector. Specifies the positions or names of global context variables. Global context variables have no incoming edges, i.e. no parents, and are themselves parents of all non-context variables in the graph.
`context.tier`	Numeric or character vector. Specifies the positions or names of tier-specific context variables. Tier-specific context variables have no incoming edges, i.e. no parents, and are themselves parents of all non-context variables in the same tier.
`verbose`	Logical asking for detailed output.

Details

See pcalg::pc.cons.intern for further information on the majority and conservative approaches to learning v-structures.

Specifying a tier for each variable using the tier argument has the following effects:

1) Only those triples x-y-z are considered as potential v-structures that satisfy t(y)=max(t(x),t(z)). This allows for three constellations: either y is in the same tier as x and both are later than z, or y is in the same tier as z and both are later than x, or all three are in the same tier. Triples where y is earlier than one or both of x and z need not be considered, as y being a collider would be against the partial ordering. Triples where y is later than both x and z will be oriented later in the pc algorithm and are left out here to minimize the number of conditional independence tests.

2) Conditional independence testing is restricted such that if x is in tier t(x) and y is in t(y), only those variables are allowed in the conditioning set whose tier is not larger than t(x).

Context variables specified via context.all or context.tier are not considered as candidate colliders or candidate parents of colliders.

Value

unfTripl: numeric vector of triples coded as numbers (via pcalg::triple2numb) that were marked as ambiguous.
sk: The updated skeleton-object (separating sets might have been updated).

Author(s)

Original code by Markus Kalisch and Diego Colombo. Modifications by Janine Witte.

Cohort Data Structure

Description

A DAG from which the data 'data_cohort' was simulated from. See Andrews et al. (2021) <https://arxiv.org/abs/2108.13395> for more information on how the data were generated.

Usage

true_cohort
true_cohort

Format

A DAG (graphNEL object) with 34 nodes and 128 edges.

References

Andrews RM, Foraita R, Witte J (2021). A practical guide to causal discovery with cohort data. <https://doi.org/10.48550/arXiv.2108.13395>

A DAG with a Partial Ordering

Description

An example DAG from which the data 'data_sim' was simulated from.

Usage

true_sim
true_sim

Format

A DAG (graphNEL object) with 9 nodes and 7 edges.

Estimate the Skeleton of a DAG while Accounting for a Partial Ordering

Description

Like pcalg::skeleton, but takes a user-specified partial node ordering into account. The conditional independence between x and y given S is not tested if any variable in S lies in the future of both x and y.

Usage

tskeleton(
  suffStat,
  indepTest,
  alpha,
  labels,
  p,
  method = c("stable", "original"),
  m.max = Inf,
  fixedGaps = NULL,
  fixedEdges = NULL,
  NAdelete = TRUE,
  tiers = NULL,
  verbose = FALSE
)
tskeleton(
  suffStat,
  indepTest,
  alpha,
  labels,
  p,
  method = c("stable", "original"),
  m.max = Inf,
  fixedGaps = NULL,
  fixedEdges = NULL,
  NAdelete = TRUE,
  tiers = NULL,
  verbose = FALSE
)

Arguments

`suffStat`	A list of sufficient statistics, containing all necessary elements for the conditional independence decisions in the function `indepTest`.
`indepTest`	Predefined `function` for testing conditional independence. It is internally called as `indepTest(x,y,S,suffStat)`, and tests conditional independence of `x` and `y` given `S`. Here, `x` and `y` are variables, and `S` is a (possibly empty) vector of variables (all variables are denoted by their (integer) column positions in the adjacency matrix). `suffStat` is a list, see the argument above. The return value of `indepTest` is the p-value of the test for conditional independence.
`alpha`	Significance level (number in (0,1) for the individual conditional independence tests.
`labels`	(optional) character vector of variable (or "node") names. Typically preferred to specifying `p`.
`p`	(optional) number of variables (or nodes). May be specified if `labels` are not, in which case `labels` is set to `1:p`.
`method`	Character string specifying method; the default, "stable" provides an order-independent skeleton, see 'Details' below.
`m.max`	Maximal size of the conditioning sets that are considered in the conditional independence tests.
`fixedGaps`	logical symmetric matrix of dimension `pp`. If entry `[i,j]` is true, the edge i-j* is removed before starting the algorithm. Therefore, this edge is guaranteed to be absent in the resulting graph.
`fixedEdges`	a logical symmetric matrix of dimension `pp`. If entry `[i,j]` is true, the edge i-j* is never considered for removal. Therefore, this edge is guaranteed to be present in the resulting graph.
`NAdelete`	logical needed for the case `indepTest(*)` returns `NA`. If it is true, the corresponding edge is deleted, otherwise not.
`tiers`	Numeric vector specifying the tier / time point for each variable. Must be of length 'p', if specified, or have the same length as 'labels', if specified. A smaller number corresponds to an earlier tier / time point. Conditional independence testing is restricted such that if `x` is in tier `t(x)` and `y` is in `t(y)`, only those variables are allowed in the conditioning set whose tier is not larger than `t(x)`.
`verbose`	if `TRUE`, detailed output is provided.

Details

See pcalg::skeleton for further information on the skeleton algorithm.

Value

An object of class "pcAlgo" (see pcalg::pcAlgo) containing an estimate of the skeleton of the underlying DAG, the conditioning sets (sepset) that led to edge removals and several other parameters.

Author(s)

Original code by Markus Kalisch, Martin Maechler, Alain Hauser and Diego Colombo. Modifications by Janine Witte.

Examples

# load simulated cohort data
data("dat_sim")
n <- nrow(dat_sim)
lab <- colnames(dat_sim)
# estimate skeleton without taking background information into account
tskel.fit <- tskeleton(suffStat = list(C = cor(dat_sim), n = n),
                       indepTest = gaussCItest, alpha = 0.01, labels = lab)
skel.fit <- pcalg::skeleton(suffStat = list(C = cor(dat_sim), n = n),
                            indepTest = gaussCItest, alpha = 0.01, labels = lab)
                            identical(skel.fit@graph, tskel.fit@graph) # TRUE

# estimate skeleton with temporal ordering as background information
tiers <- rep(c(1,2,3), times=c(3,3,3))
tskel.fit2 <- tskeleton(suffStat = list(C = cor(dat_sim), n = n),
                       indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers)

# in this case, the skeletons estimated with and without
# background knowledge are identical, but fewer conditional
# independence tests were performed when background
# knowledge was taken into account
identical(tskel.fit@graph, tskel.fit2@graph) # TRUE
[email protected]
[email protected]


# load simulated cohort data
data("dat_sim")
n <- nrow(dat_sim)
lab <- colnames(dat_sim)
# estimate skeleton without taking background information into account
tskel.fit <- tskeleton(suffStat = list(C = cor(dat_sim), n = n),
                       indepTest = gaussCItest, alpha = 0.01, labels = lab)
skel.fit <- pcalg::skeleton(suffStat = list(C = cor(dat_sim), n = n),
                            indepTest = gaussCItest, alpha = 0.01, labels = lab)
                            identical(skel.fit@graph, tskel.fit@graph) # TRUE

# estimate skeleton with temporal ordering as background information
tiers <- rep(c(1,2,3), times=c(3,3,3))
tskel.fit2 <- tskeleton(suffStat = list(C = cor(dat_sim), n = n),
                       indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers)

# in this case, the skeletons estimated with and without
# background knowledge are identical, but fewer conditional
# independence tests were performed when background
# knowledge was taken into account
identical(tskel.fit@graph, tskel.fit2@graph) # TRUE
tskel.fit@n.edgetests
tskel.fit2@n.edgetests

Package 'tpc'

Help Index

Simulated Cohort Data

Description

Usage

Format

References

See Also

Simulated Cohort Data - discretized

Description

Usage

Format

References

See Also

Simulated Cohort Data - with missing values

Description

Usage

Format

References

See Also

Simulated Data with a Partial Ordering

Description

Usage

Format

Last Step of tPC Algorithm: Apply Meek's rules

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

PC Algorithm Accounting for a Partial Node Ordering

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Utility for Conservative and Majority Rule in tpc

Description

Usage

Arguments

Details

Value

Author(s)

Cohort Data Structure

Description

Usage

Format

References

See Also

A DAG with a Partial Ordering

Description

Usage

Format

See Also

Estimate the Skeleton of a DAG while Accounting for a Partial Ordering

Description

Usage

Arguments

Details

Value

Author(s)

Examples