shedding python package

shedding.data

shedding.data.flatten_datasets(datasets, loq_fill_value=nan, day_fill_value=nan)

Flatten datasets into a uniform structure for inference.

Parameters:
  • datasets (dict) – Mapping of datasets.

  • loq_fill_value (float) – Fill value to replace loads below the level of quantification (LOQ) as log10 gene copies per mL.

  • day_fill_value (float) – Fill value to replace missing information about days since symptoms began as days.

Returns:

data – Data obtained by flattening datasets, including

  • num_patients (int): Total number of patients across all datasets.

  • num_samples (ndarray[int]<num_patients>): Total number of samples across all datasets.

  • num_samples_by_patient (ndarray[int]<num_patients>): Number of samples for each patient.

  • num_positives_by_patient (ndarray[int]<num_patients>): Number of positive samples for each patient.

  • num_negatives_by_patient (ndarray[int]<num_patients>): Number of negative samples for each patient.

  • idx (ndarray[int]<num_samples>): Index of the patient from whom each sample was collected.

  • load (ndarray[float]<num_samples>): Viral RNA load for each sample as gene copies per mL or 10 ** loq_fill_value if the concentration is below the level of quantification.

  • loadln (ndarray[float]<num_samples>): Natural logarithm of load.

  • loq (ndarray[float]<num_samples>): Level of quantification for each sample as gene copies per mL.

  • loqln (ndarray[float]<num_samples>): Natural logarithm of loq.

  • positive (ndarray[bool]<num_samples>): Indicator for each sample as to whether the RNA load is above the level of quantification.

  • day (ndarray[int]<num_samples>): Day after symptom onset on which the sample was collected or day_fill_value if unavailable.

  • dataset (ndarray[str]<num_samples>): Dataset from which each sample was obtained.

  • patient (ndarray[int]<num_samples>): Patient from which each sample was obtained as reported in the corresponding dataset.

Return type:

dict

shedding.data.load_dataset(name, root=None)

Load a dataset given its name.

Parameters:
  • name (str) – Name of the dataset to load.

  • root (str) – Root directory to load the dataset from.

Returns:

dataset – Loaded dataset as a dictionary.

Return type:

dict

shedding.data.load_datasets(names, root=None)

Load multiple datasets into a dictionary.

Parameters:
  • names (list[str]) – Sequence of datasets to load.

  • root (str) – Root directory to load datasets from.

Returns:

datasets – Mapping of datasets keyed by name.

Return type:

dict

shedding.model

class shedding.model.Model(num_patients, parametrisation='general', inflated=False, temporal=False, priors=None)

Generalised gamma model for viral load in faecal samples.

Parameters:
  • num_patients (int) – Number of patients.

  • parametrisation (Parametrisation) – Parametrisation used by the model (see notes for details).

  • inflated (bool) – Whether there is a “zero-inflated” subpopulation of patients who do not shed RNA.

  • temporal (str) – Whether to include a shedding profile in the fit.

  • priors (dict) – Mapping of parameter names to callable priors.

Notes

The generalised gamma distribution is a versatile distribution for positive continuous data. We use the parametrisation considered by Stacey with shape parameter \(q\), location parameter \(\mu\), and scale parameter \(\sigma\). Then if \(\gamma\) follows a gamma distribution with shape parameter \(a=1/q^2\), the random variable

\[x = \exp\left(\mu + \frac{\sigma}{q}\log\left(q^2\gamma\right)\right)\]

follows a generalised gamma distribution.

The generalised gamma distribution includes the regular gamma distribution (\(\sigma=q\)), Weibull distribution (\(q=1\)), and lognormal distribution (\(q=0\)) as special cases. The model can be restricted to a particular distribution using the parametrisation parameter.

evaluate_log_likelihood(values, data)

Evaluate the log likelihood of the data conditional on all model parameters.

evaluate_marginal_log_likelihood(values, data, n=1000, eps=1e-06, **kwargs)

Evaluate the log likelihood of the observed data marginalised with respect to group-level parameters but conditional on hyperparameters.

Parameters:
  • values (dict) – Posterior sample.

  • data (dict) – Data from which the posterior samples were inferred.

  • n (int or str) – Number of samples to use if simulation is required to evaluate the likelihood or scipy for numerical integration.

  • eps (float) – Value to clip the patient mean at to avoid underflows.

Returns:

likelihood – Marginal log likelihood for each patient.

Return type:

np.ndarray[num_patients]

evaluate_statistic(values, statistic)

Evaluate a statistic of the model.

Parameters:
  • sample (dict) – Parameter values.

  • statistic (str) – Statistic to evaluate.

Returns:

value – Value of the desired statistic given parameter values.

Return type:

float

rvs(values, size=None)

Draw a sample from the posterior predictive distribution.

Parameters:
  • values (dict) – Posterior sample.

  • size (int or tuple[int]) – Size of the sample to draw.

Returns:

sample – Sample drawn from the posterior predictive distribution.

Return type:

ndarray[size]

sample_individual_params(values)

Sample individual-level parameters.

sample_params(values)

Sample shared and individual-level parameters from the hierarchical prior.

sample_shared_params(values)

Sample parameters that are shared amongst individuals.

simulate(values, data, simulation_mode)

Generate simulated data for a posterior sample.

Parameters:
  • sample (dict or list[dict]) – Posterior sample or sequence of posterior samples.

  • data (dict) – Data from which the posterior samples were inferred.

  • simulation_mode (SimulationMode) – Whether to simulate only the lowest level of the hierarchical model (e.g. generate new data from existing groups) or simulate the entire hierarchy (e.g. generate new data from new groups).

Returns:

simulation – Simulated data.

Return type:

dict

class shedding.model.Parametrisation(value)

An enumeration.

class shedding.model.Profile(value)

An enumeration.

evaluate_offset(day, values)

Evaluate the offset from the population-level location parameter due to the shedding profile.

Parameters:

values (dict) – Parameter values for which to evaluate the offset.

class shedding.model.SimulationMode(value)

Enum to indicate whether simulation should be for EXISTING_PATIENTS or NEW_PATIENTS. Given posterior samples, the former can be used to assess the fit of the model to samples collected from new, unknown patients, and the latter can be used to assess the fit to new samples collected from existing patients.

Notes

The NEW_PATIENTS replication mode can be used to simulate data given hyperparameters at the population level.

class shedding.model.Transformation

A bijective transformation \(y = f(x)\) for regular variables \(x\) and transformed variables \(y\). For sampling purposes, the transformation induces a volume compression or expansion that needs to be accounted for, i.e.

\[\log P(y) = \log P(x=f^{-1}(y)) + \log \frac{dx}{dy}.\]
inverse(values)

Transform from the regular space to the transformed space.

jacobianlogdet(values)

Evaluate the logarithm of the Jacobian determinant to account for sampling in the transformed space rather than the regular space.

shedding.model.broadcast_samples(func)

Decorator to broadcast the function across samples provided as the first argument.

shedding.model.evaluate_size(parameters)

Evaluate the size, i.e. number of elements, for a set of parameters.

Parameters:

parameters (dict[str, tuple]) – Mapping from parameter names to shapes.

Returns:

int – Total number of elements in the parameter set.

Return type:

size

shedding.model.from_abc(a, b, c)

Transform from the parametrisation in terms of shape, scale, and exponent to Stacey’s parametrisation.

shedding.model.gengamma_mean(q, mu, sigma, qmin=1e-06)

Evaluate the mean of the generalised gamma distribution.

shedding.model.to_abc(q, mu, sigma)

Transform from Stacey’s parametrisation to the parametrisation in terms of shape, scale, and exponent.

shedding.model.transpose_samples(samples, parameters=None)

Transpose samples from a list of dictionaries to a dictionary of lists and vice versa.

Parameters:
  • samples (list[dict[str,np.ndarray]] or dict[str,list]) – Samples to transpose.

  • parameters (dict[str,tuple]) – Parameter shape definitions to use if samples is an array.

Returns:

samples – Sequence of samples, each represented by a dictionary.

Return type:

list[dict]

shedding.model.values_to_vector(parameters, values, size=None)

Convert a dictionary of values to a vector.

Parameters:
  • values (dict[str, np.ndarray]) – Mapping from parameter names to values.

  • parameters (dict[str, tuple]) – Mapping from parameter names to shapes.

Returns:

vector – Vector of parameters corresponding to values.

Return type:

np.ndarray

shedding.model.vector_to_values(parameters, vector)

Convert a vector to a dictionary of values.

Parameters:
  • vector (np.ndarray) – Vector of parameters corresponding to values.

  • parameters (dict[str, tuple]) – Mapping from parameter names to shapes.

Returns:

values – Mapping from parameter names to values.

Return type:

dict[str, np.ndarray]

shedding.model.write_paramnames_file(parameters, filename, escape=True)

Write parameter names to a file.

shedding.util

shedding.util.broadcast_shapes(*shapes)

Evaluate the shape due to broadcasting any number of shapes against each other.

Parameters:

*shapes (tuples) – The shapes to broadcast.

Returns:

broadcasted_shape – Shape of the array that would be obtained when broadcasting the shapes against each other.

Return type:

tuple

Examples

>>> broadcast_shapes((2, 1), (1, 3))
(2, 3)
shedding.util.dict_to_array(mapping, size=None, fill_value=0, dtype=None)

Convert a key-value mapping to a numpy array.

Parameters:
  • mapping (dict) – Mapping from indices to values.

  • size (int or tuple) – Size of the resulting array or max(mapping) + 1 if omitted.

  • fill_value – Fill value for missing indices.

  • dtype (type) – Data type of the resulting array.

Returns:

x – Data after conversion to an array.

Return type:

ndarray[dtype]<size>

shedding.util.extract_kvps(x, pattern='(.*?)_rep$')

Extract key-value pairs matching a pattern from a dictionary or list of dictionaries.

Parameters:
  • x (dict or list[dict]) – Dictionary or list of dictionaries from which to extract key-value pairs.

  • pattern (str) – Regular expression pattern to match keys against. The first capture group determines the key in the resulting dictionary.

Returns:

x – Dictionary or list of dictionaries after extraction of key-value pairs.

Return type:

dict or list[dict]

Examples

>>> extract_kvps({'a_': 1, 'b': 2}, r'(\w)_')
{'a': 1}
shedding.util.flush_traceback(func)

Decorator to flush the traceback of an exception before the kernel dies.

shedding.util.label_axes(axes, x=0.05, y=0.95, va='top', offset=0, **kwargs)

Attach alphabetical labels to a sequence of axes.

shedding.util.logmeanexp(x, axis=None, **kwargs)

Evaluate the logarithm of the mean of exponentiated values in a numerically stable fashion.

Parameters:
  • x (array_like) – Input array.

  • axis (None or int or tuple[int], optional) – Axis or axes over which the mean is taken. By default axis is None, and all elements are averaged.

  • **kwargs (dict) – Keyword arguments passed to scipy.special.logsumexp.

Returns:

ynp.log(np.mean(np.exp(a))) calculated in a numerically more stable way.

Return type:

ndarray

shedding.util.plot_kde(kde, xmin=None, xmax=None, factor=3, numlin=50, ax=None, **kwargs)

Plot a kernel density estimate.

Parameters:
  • xmin (float) – Lower limit of the kernel density estimate.

  • xmax (float) – Upper limit of the kernel density estimate.

  • factor (float) – Multiplicative factor for the kernel density estimate standard deviation to extend the range of the kernel density estimate.

  • numlin (int) – Number of points at which to evaluate the kernel density estimate.

  • ax – Axes to plot into.

  • **kwargs (dict) – Keyword arguments passed to ax.plot.

shedding.util.plot_replication_summary(data, replicates_by_model, target=None, ax=None, **kwargs)

Plot kernel density estimates of data and a set of replicates.

Parameters:
  • data – Data used to generate replicates.

  • replicates_by_model (dict[str, list] or list) – List of replicates keyed by model name or a list of replicates for a single model.

  • target (callable) – Function to extract summary statistics from the data or replicates.

  • ax – Axes to plot into.

  • **kwargs (dict) – Keyword arguments passed to plot_kde.

shedding.util.qq_plot(y, dist, yerr=None, ax=None, **kwargs)

Generate a Q-Q plot for the empirical data and a theoretical distribution.

Parameters:
  • y (ndarray) – Empirical data.

  • dist – Theoretical distribution.

  • yerr (ndarray) – Errors on empirical data.

  • ax – Axes to plot into.

  • **kwargs (dict) – Keyword arguments passed to ax.errorbar.

shedding.util.reload(module, predicate=None, exclude=None)

Reload modules recursively depth-first if predicate returns True.

Parameters:
  • module – Root module from which to start the depth-first recursion.

  • predicate (callable) – Predicate to determine whether the module should be reloaded.

  • exclude (list) – List of modules to exclude, used to ensure modules are only reloaded once.

shedding.util.replication_percentile_plot(data, replicates, key=None, percentiles=10, ax=None, **kwargs)

Generate a violin plot of replicated percentiles against empirical percentiles.

Parameters:
  • data – Empirical data.

  • replicates (list) – Posterior predictive replicates of the data.

  • key (callable) – Callable to extract summary statistics from the data and replicates.

  • percentiles (ndarray or int) – Percentiles of the summary statistic to plot or an integer number of quantiles.

  • ax – Axes to plot into.

  • **kwargs (dict) – Keyword arguments passed to ax.violinplot.

shedding.util.skip_doctest(obj)

Decorator to skip doctests.

shedding.util.softmax(x, axis=None)

Evaluate the softmax of x in a stable fashion.