Data Governance
PsyCloud separates authoring artifacts, immutable releases, participant sessions, telemetry, exports, and provider connectors. The goal is reproducible research with enough provenance to explain what a participant saw and how a dataset was derived.
What PsyCloud stores
| Data class | Examples | Purpose |
|---|---|---|
| Account metadata | User email, name, auth tokens, OAuth accounts, workspace memberships | Researcher login, access control, and collaboration. |
| Workspace/project metadata | Workspaces, projects, studies, connector records | Organizing research and provider configuration. |
| Authoring state | Study drafts, revisions, IR, design recipe, semantics, asset index | Editing in Studio and preserving authoring history. |
| Releases | Bundles, manifest locks, runtime version, telemetry schema version, asset manifest | Immutable participant-facing study versions. |
| Assets | Uploaded images, audio, video, fonts, and metadata | Runtime delivery and reproducibility. |
| Participant records | Run links, sessions, pseudonymous subject ids, assignment objects, status, timestamps | Study entry, progress, completion, and provenance. |
| Telemetry | Ordered runtime event envelopes, cursors, errors | Data export, monitoring, replay, and debugging. |
| Derived data | Datasets, CSV/JSON/snapshot exports, QC summaries | Analysis and audit artifacts. |
| Recruitment data | Provider bindings, review queues, verdicts, webhooks | Prolific/MTurk lifecycle and payment/review workflows. |
Participant identity
Hosted runs use a pseudonymous subjectId and a structured participantRef rather than requiring
direct identifying information. External recruitment IDs and completion codes may be stored when a
provider workflow needs them. If a study asks participants to type names, emails, health details, or
other sensitive information, that data becomes part of the research payload and must be justified by
the protocol.
Retention posture
| Data | Current alpha posture |
|---|---|
| Bundles and release metadata | Retained indefinitely by default because runs depend on immutable release hashes. |
| Study drafts and revisions | Retained as authoring history unless an operator/self-host admin removes them. |
| Assets | Retained while referenced; orphan cleanup is configurable for storage backends. |
| Sessions and telemetry events | Retained by default for reproducibility, export, monitoring, and replay. |
| Ledger and counterbalance events | Retained as audit state for allocation and state-machine decisions. |
| Export artifacts | Temporary download artifacts should expire through object-storage lifecycle policy; keep institutional copies outside PsyCloud as needed. |
| Auth, verification, OAuth, and reset tokens | Short-lived by configuration. |
| Webhook receipts and error reports | Operational records with configurable cleanup in production deployments. |
Alpha builds do not expose a general researcher-facing "delete run" or "delete session" UI. Hosted deletion is an operator process; self-hosted deletion is an administrator process. Plan consent language and institutional storage around that current limitation.
Export governance
Exports are derived from hosted run/session data. CSV is convenient for analysis, JSON preserves structure, and snapshots package enough provenance to audit the dataset later. Once downloaded, exports should be handled under your lab or institution's data-management plan.
Recommended practice:
- Export only the run or session scope you need.
- Store downloaded exports in approved institutional storage.
- Keep the
bundleHash, run id, export version, and analysis script with your analysis outputs. - Remove local copies that are no longer needed, especially when exports contain free-text or provider identifiers.
Self-hosting controls
Self-hosted deployments control their own database, storage bucket, secrets, allowed origins, email provider, OAuth provider credentials, and recruitment connector credentials. Production deployments should set explicit JWT secrets, explicit CORS origins, remote storage credentials when using S3/MinIO, and a storage lifecycle policy for temporary exports.