New cache format

DanielHolth · November 12, 2022, 2:45pm

We want to add a <hash>.state.json to conda’s cache, to replace the current “prepend three fields to the beginning of a 200MB json” format. Enables the cached repodata.json to be identical to upstream. <hash> would be the same as the previous cache since a few programs expect this. It would look like this. mtime is an addition - check that the mtime of <hash>.json matches the one inside <hash>.state.json to detect older clients. The file may contain arbitrary data to support incremental repodata e.g.

{
 "_url": "https://repo.anaconda.com/pkgs/main/osx-arm64",
 "_etag": "W/\"cdee7221e6860fafe36bc78789d636be\"",
 "_mod": "Fri, 11 Nov 2022 23:28:04 GMT",
 "_cache_control": "public, max-age=30",
 "mtime": 1668259353.941853
}

Medium-term we should add a command to show the cache, for interested applications, so that no program depends on conda’s specific cache format.

wolfv · November 13, 2022, 1:52pm

That looks very good to me and much better than the current “insert data into json” hack!

DanielHolth · November 13, 2022, 8:20pm

We will have the problem of incremental/non-incremental cache clients overwriting each other. When you switch to using the jlap file to download deltas from the last complete repodata.json, the etag / mod will have to come from the jlap file (since no new request was made for the full json, and we will want to replace the unmodified repodata.json with a patched version). A client that doesn’t expect the state file, and expects _mod and _etag to be in-line, will also think the cache is outdated and download repodata.json again. These clients could be run in “offline - don’t re-download repodata.json” mode to avoid this.

I assume that the problem will not be noticeable if most users type conda install less frequently than the remote repodata.json updates, since they would have had to download a fresh one either way.

DanielHolth · November 17, 2022, 8:06pm

@wolv did I hear correctly that mamba has added locking on its caches for concurrent runs? Where does that happen? Oh, looks like Changed LockFile to be a non-throwing checkable type, no pointers use. · mamba-org/mamba@2b7b230 · GitHub is a good starting point. Also exposed to Python in mamba/__init__.pyi at main · mamba-org/mamba · GitHub

DanielHolth · January 10, 2023, 4:14pm

Thanks for writing this CEP where we are working on the details of the format. initial cep for repodata state by wolfv · Pull Request #46 · conda-incubator/ceps · GitHub