Data Governance

PsyCloud separates authoring artifacts, immutable releases, participant sessions, telemetry, exports, and provider connectors. The goal is reproducible research with enough provenance to explain what a participant saw and how a dataset was derived.

What PsyCloud stores

Data class	Examples	Purpose
Account metadata	User email, name, auth tokens, OAuth accounts, workspace memberships	Researcher login, access control, and collaboration.
Workspace/project metadata	Workspaces, projects, studies, connector records	Organizing research and provider configuration.
Authoring state	Study drafts, revisions, IR, design recipe, semantics, asset index	Editing in Studio and preserving authoring history.
Releases	Bundles, manifest locks, runtime version, telemetry schema version, asset manifest	Immutable participant-facing study versions.
Assets	Uploaded images, audio, video, fonts, and metadata	Runtime delivery and reproducibility.
Participant records	Run links, sessions, pseudonymous subject ids, assignment objects, status, timestamps	Study entry, progress, completion, and provenance.
Telemetry	Ordered runtime event envelopes, cursors, errors	Data export, monitoring, replay, and debugging.
Derived data	Datasets, CSV/JSON/snapshot exports, QC summaries	Analysis and audit artifacts.
Recruitment data	Provider bindings, review queues, verdicts, webhooks	Prolific/MTurk lifecycle and payment/review workflows.

Participant identity

Hosted runs use a pseudonymous subjectId and a structured participantRef rather than requiring direct identifying information. External recruitment IDs and completion codes may be stored when a provider workflow needs them. If a study asks participants to type names, emails, health details, or other sensitive information, that data becomes part of the research payload and must be justified by the protocol.

Retention posture

Data	Current alpha posture
Bundles and release metadata	Retained indefinitely by default because runs depend on immutable release hashes.
Study drafts and revisions	Retained as authoring history unless an operator/self-host admin removes them.
Assets	Retained while referenced; orphan cleanup is configurable for storage backends.
Sessions and telemetry events	Retained by default for reproducibility, export, monitoring, and replay.
Ledger and counterbalance events	Retained as audit state for allocation and state-machine decisions.
Export artifacts	Temporary download artifacts should expire through object-storage lifecycle policy; keep institutional copies outside PsyCloud as needed.
Auth, verification, OAuth, and reset tokens	Short-lived by configuration.
Webhook receipts and error reports	Operational records with configurable cleanup in production deployments.

Deletion is not yet self-service

Alpha builds do not expose a general researcher-facing "delete run" or "delete session" UI. Hosted deletion is an operator process; self-hosted deletion is an administrator process. Plan consent language and institutional storage around that current limitation.

Export governance

Exports are derived from hosted run/session data. CSV is convenient for analysis, JSON preserves structure, and snapshots package enough provenance to audit the dataset later. Once downloaded, exports should be handled under your lab or institution's data-management plan.

Recommended practice:

Export only the run or session scope you need.
Store downloaded exports in approved institutional storage.
Keep the bundleHash, run id, export version, and analysis script with your analysis outputs.
Remove local copies that are no longer needed, especially when exports contain free-text or provider identifiers.

Self-hosting controls

Self-hosted deployments control their own database, storage bucket, secrets, allowed origins, email provider, OAuth provider credentials, and recruitment connector credentials. Production deployments should set explicit JWT secrets, explicit CORS origins, remote storage credentials when using S3/MinIO, and a storage lifecycle policy for temporary exports.