Skip to content

[feat]: Command to trigger a full re-crawl after the initial index on demand #244

@bygadd

Description

@bygadd

Observation (context_chat main, 5.4.0-beta0)

While reviewing the file-indexing job lifecycle on main I noticed the crawl chain is bootstrapped only on fresh install and self-removes, and I'd like to understand how a full re-crawl is meant to be (re)triggered afterwards.

What I see in the code:

  • SchedulerJob (lib/BackgroundJobs/SchedulerJob.php) enumerates the mounts, adds one StorageCrawlJob per mount, then removes itself ($this->jobList->remove(self::class), ~L48). It is a QueuedJob, seeded only by the <install> repair step AppInstallStep (appinfo/info.xml declares only FileSystemListenerJob + RotateLogsJob under <background-jobs>; there is no <post-migration> step).
  • StorageCrawlJob self-perpetuates while a mount still has files (scheduleAfter(self::class, …), ~L85) and otherwise removes itself.

So on a healthy fresh install the initial crawl runs to completion and then both jobs are gone. My question is about what happens after that:

  1. A new external storage is mounted after the initial crawl has finished — SchedulerJob (which is what enumerates mounts) is no longer scheduled, so is there anything that discovers and crawls the new mount? (FileSystemListenerJob handles live filesystem events, but those don't fire for the pre-existing contents of a freshly-mounted storage.)
  2. An app upgrade — since the seed lives under <install> (not <post-migration>), occ upgrade won't re-run it. If the initial crawl had been interrupted/incomplete before the upgrade, is there a path that resumes a full crawl, or does it rely solely on FileSystemListenerJob from then on?

Question

Is a <post-migration> re-seed of SchedulerJob (mirroring the existing <install> AppInstallStep, similar to how Recognize wires its InstallDeps under both <install> and <post-migration>) intended/desirable — or is mount discovery / re-crawl already handled by a mechanism I've missed? Happy to open a small PR for the <post-migration> re-seed if it'd be useful.

Context: we hit a related "initial indexing never completes" symptom on the 5.3.x line where the crawl chain had self-deleted with a backlog still queued; on main the backend-pull rearchitecture changes this substantially, so I want to confirm the intended behaviour before proposing anything.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions