PsyCloud

Data Governance

PsyCloud separates authoring artifacts, immutable releases, participant sessions, telemetry, exports, and provider connectors. The goal is reproducible research with enough provenance to explain what a participant saw and how a dataset was derived.

What PsyCloud stores

Data classExamplesPurpose
Account metadataUser email, name, auth tokens, OAuth accounts, workspace membershipsResearcher login, access control, and collaboration.
Workspace/project metadataWorkspaces, projects, studies, connector recordsOrganizing research and provider configuration.
Authoring stateStudy drafts, revisions, IR, design recipe, semantics, asset indexEditing in Studio and preserving authoring history.
ReleasesBundles, manifest locks, runtime version, telemetry schema version, asset manifestImmutable participant-facing study versions.
AssetsUploaded images, audio, video, fonts, and metadataRuntime delivery and reproducibility.
Participant recordsRun links, sessions, pseudonymous subject ids, assignment objects, status, timestampsStudy entry, progress, completion, and provenance.
TelemetryOrdered runtime event envelopes, cursors, errorsData export, monitoring, replay, and debugging.
Derived dataDatasets, CSV/JSON/snapshot exports, QC summariesAnalysis and audit artifacts.
Recruitment dataProvider bindings, review queues, verdicts, webhooksProlific/MTurk lifecycle and payment/review workflows.

Participant identity

Hosted runs use a pseudonymous subjectId and a structured participantRef rather than requiring direct identifying information. External recruitment IDs and completion codes may be stored when a provider workflow needs them. If a study asks participants to type names, emails, health details, or other sensitive information, that data becomes part of the research payload and must be justified by the protocol.

Retention posture

DataCurrent alpha posture
Bundles and release metadataRetained indefinitely by default because runs depend on immutable release hashes.
Study drafts and revisionsRetained as authoring history unless an operator/self-host admin removes them.
AssetsRetained while referenced; orphan cleanup is configurable for storage backends.
Sessions and telemetry eventsRetained by default for reproducibility, export, monitoring, and replay.
Ledger and counterbalance eventsRetained as audit state for allocation and state-machine decisions.
Export artifactsTemporary download artifacts should expire through object-storage lifecycle policy; keep institutional copies outside PsyCloud as needed.
Auth, verification, OAuth, and reset tokensShort-lived by configuration.
Webhook receipts and error reportsOperational records with configurable cleanup in production deployments.
Deletion is not yet self-service

Alpha builds do not expose a general researcher-facing "delete run" or "delete session" UI. Hosted deletion is an operator process; self-hosted deletion is an administrator process. Plan consent language and institutional storage around that current limitation.

Export governance

Exports are derived from hosted run/session data. CSV is convenient for analysis, JSON preserves structure, and snapshots package enough provenance to audit the dataset later. Once downloaded, exports should be handled under your lab or institution's data-management plan.

Recommended practice:

  • Export only the run or session scope you need.
  • Store downloaded exports in approved institutional storage.
  • Keep the bundleHash, run id, export version, and analysis script with your analysis outputs.
  • Remove local copies that are no longer needed, especially when exports contain free-text or provider identifiers.

Self-hosting controls

Self-hosted deployments control their own database, storage bucket, secrets, allowed origins, email provider, OAuth provider credentials, and recruitment connector credentials. Production deployments should set explicit JWT secrets, explicit CORS origins, remote storage credentials when using S3/MinIO, and a storage lifecycle policy for temporary exports.

Next