Policy / Recommendations for optional dependencies?

ChrisBarker-NOAA · February 24, 2023, 11:51pm

I’ve noticed a number of conda-forge packages list run-time dependencies that are not required to use the package.

I expect that’s because they are required to run the tutorials, demos, maybe some of the tests, etc.

I understand the desire for your users to be able to conda install my_package and immediate run the demos / tutorials, but that means you can get some pretty heavyweight dependencies, like Jupyter, matplotlib, which can then bring in PyQt, and so on.

Anyway – I like to keep my environments lean – so conda packages should have minimal dependencies. But before I start writting PRs, I’d like to know what the community thinks is best practice.

dhirschfeld · February 25, 2023, 12:49pm

Conda’s inability to specify optional dependencies (other than creating separate packages) is a giant wart IMHO.

github.com/mamba-org/boa

[recipe-spec] Allow for arbitrary optional dependencies

opened 11:18AM - 17 Feb 22 UTC

dhirschfeld

### Proposal: Have a single top-level `requirements` section as there is curren…tly with required keys `host` and `run` and then optionally allow ***any other keys*** to specify named dependencies. A `requirements.build` key would then be handled the same as currently but test dependencies would now be included as an optional `requirements.test` key. The specification of test dependencies under `requirements.test` would then be consistent with the specification of other dependencies under `requirements` (e.g. `requirements.build` rather than `build.requirements`) In this way `mamba/boa` could provide support for the [`extras_require`](https://setuptools.pypa.io/en/latest/userguide/dependency_management.html?highlight=extra#optional-dependencies) capability in `setuptools`/`pip` which is very widely used, and very useful. Similar to `setuptools`, having a special-cased test requirements section could be removed in favour of treating them like any other optional dependency: ![image](https://user-images.githubusercontent.com/881019/154468909-ed6dc5a1-5c69-4aa6-b4b7-0a20416da381.png) To get the benefit of this new capability would require all the requirements to be saved in the package metadata and for `mamba` to allow users to install *any* named dependencies in the package metadata. Currently, it's very awkward for users to create either a build or test environment as there's no way to tell `mamba` to create an environment with those specific deps. The current recommendation to instead create separate *outputs* for any optional dependencies is awkward to specify and clutters up package repositories with multiple metadata-only packages for every single build.

beckermr · February 25, 2023, 2:53pm

We have been “batteries include” for a while now. Some recipes have multiple outputs that install extra stuff. However, if we remove an optional dependence from a given recipe (say dependency foo for package blah), the kinds of changes we can make that are not breaking are limited. We’ll need to make a blah-base without foo and then make a metapackage called blah to installs blah-base and foo. Otherwise, we’ll be breaking people’s environments. You’ll have to install blah-base yourself to get the leaner environment.

@dhirschfeld If we are going to propose specs, they are best directed towards conda-incubator/ceps.

ChrisBarker-NOAA · February 25, 2023, 7:43pm

I tend to agree here – but for now, I’d like to discuss what best practices are with the current technology.

Is “We” in this case conda-forge? Is that written down somewhere?

Even if so, what “batteries” should be included – I see two distinct kinds of extra dependencies:

optional features: e.g. dask.distributed – ou can use dask without it, but there are some features you can’t use. I can certainly see an argument for including this kind of thing.
Other packages used for demos, tutorials, etc – e.g matplotlib or Jupyter for a package that is not about plotting – of course those are usual for demoing, etc, but not required for the core functionality.

Personally, it’s (2) that I object to – that tends to being in a sometimes a very large dependency stack that it utterly irrelevant to actually using the package. The assumption that everyone that wants some computation feature is doing interactive data analysis with the full SciPy stack is just plain wrong.

– and if this package is not at the top of the requirements stack, it’s VERY hard to avoid bringing that stuff in – you’d essentially have to turn off conda’s dependency resolution. And if the extra requirments are pinned at all, that could really create a challenge.

NOTE: MPL eventually addressed some of this within its stack by creating a matplotlib-base package, so you wouldn’t have to bring in all the back-ends if you didn’t want to. I think that should be recommended policy.

Actually, I think recommended policy should be for the “standard” package to be minimal, and either:

-Document that if you wan to run the demos you are going to need, these packages. You can provide a demos_requirements.txt file if you want.

Offer another package that provided every likley related dependency.
- @dhirschfeld is right – this shoudln’t be necceasy, but it is right now.

Honestly, I thikn folks are including everything not because they really think it’s best, but because they haven’t thought carefully about it. In particular, the full stack is what developers need to work on the documentation and demos, etc – why wouldn’t they include them?

Anyway, at the end of the day, I’ll have to live with what the conda-forge communities’ consensus on this is, but I thikn that shold be documented.

In some of my work, I have three (or four!) sets of requirments:

requirements_run
requirements_develop
requirements_test
requirements_docs

(I could probably merge dev and docs without anyone complaining, but …)

It can be a bit annoying, but it does let us build lean environments.

beckermr · February 26, 2023, 12:23am

We do not have a written policy but I has been discussed amongst the core group for years now.

A written policy is hard in this case since things can be very subtle.

ChrisBarker-NOAA · February 26, 2023, 1:38am

Well, a “policy” is tricky, but recommendations / guidelines would be very helpful.

With a touch more detail than “batteries included”

h-vetinari · February 26, 2023, 9:30pm

On several packages I’m aware of (and co-maintain), optional outputs get mapped to separate outputs, e.g. pip install foo[extra] has an equivalent conda install foo-extra.

I’d really love to have better syntax for (pip-)optional dependencies, but that would be quite a large amount of work.

ChrisBarker-NOAA · February 27, 2023, 5:33pm

Exactly – what I’m suggesting is that that be “standard practice” on conda-forge.

However, the cases I’ve seen are actually NOT set up as optional features with pip either – but PyPi is not curated, folks can do whatever they want.

Also: in the case of [extras] are they ONLY handy dependencies? Or is the actual additional functionality of the package? The cases that bother me the most is dependencies (particularly large ones) that are in no way required for the full functionality of the package – only for demos, tutorials etc. That’s what I’d really like to see left out.

If a package maintainer really want to be able to give their users a single thing to install to be able to run all their examples, etc, I’d like to see recommended practice to make another meta-package: foo-demos that would install foo an all the other stuff that might be useful with it.

TL;DR: packages should not include “things that are handy to use along with that package” as a dependency.

Xylem · March 2, 2023, 10:16pm

I’ve been working on a related issue in a repo that is not yet up-to-snuff to be published on conda-forge (but will be soon). See the new_conda_env project here.

I used to install jupyterlab in all my environments until I found out that having jlab in base and ipykernel in each env was enough.