Hosting conda private packages

I recently joined a start up and have been tasked with setting up package management for our internal python libraries. We work in the biotech and ml space, and a lot of the packages we use are index on conda channels.

The current setup we have right now is to install our local repositories from GitHub, which are built with setup.py. Initially I thought that we should just use poetry for all of our package management, and for any of our own private libraries, have a aws codebuild/artifact setup to host our libraries. I still think this seems like the best option for doing package management in python.

On the other hand, there are too many packages that we need that are available in conda ecosystem that are not available on pypi. We’ve noticed now that several dependencies have clashed between pip and conda when trying to use both at the same time. So we might as well lean into using the conda ecosystem completely.

In order to do this, I think that a good idea would be to use a private conda channel for any of our own libraries, and use conda-forge for any repos that we might need from pip. If for some reason we can’t find a package on conda-forge, there seems to be a pretty easy process to follow to get it there from pypi.

My question is the following: Has anyone hosted their own conda channel before?

I’ve seen and tried options from:

The only way to use the AWS and Azure solutions is to locally mount the files from s3 in order to use the channel correctly, this just does not seem like the right way to use a conda channel, not to mention it involves downloading all/most of the files in the bucket in order to properly index the channel.

The anaconda and quetz solutions seem like a step up from mounting the s3 buckets, but they don’t allow for federated logins, at least not natively, which leaves using something artifactory or some equivalent tool. Quetz seems like it’s not in active development.

I haven’t really found any reports/guides for the standard way of doing this, which I find really surprising, because I can’t be the only one running into this. As far as I can tell, artifactory is the most enterprise ready solution that is available in order to do this, but I’m curious if there’s something I haven’t seen before, and whether others have ran into this problem as well.

Here is what the environment.yml file would look like for what we currently do.

name: package
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - awscli=1.27.134
  - pip
  - python=3.10.11
  - pip:
    - GitHub.com/internal_repo/version/files.tar.gz
    - other pip dependencies
1 Like

Hi @mbest,

I only have experience hosting private channels on anaconda.org, and those have only been for training purposes. I haven’t tried any of the other approaches either.

There is another service that offers private channel hosting: Cloudsmith provides public & private repositories for Conda.

This is also a great topic. I’ll promote it on Twitter and Mastodon as well. We’ll see if we can get you answers.

Dave C

Hello!

I’ve experience with both Artifactory and Quetz.

I used Artifactory in the past. It’s not free but worked well. It can also be used for other packages repository (like pypi).

At my current job, I deployed Quetz. For our use case, it works well. I did setup authentication with our internal gitlab server. I login only to create channels and API keys.
The server is on our internal network and the channels are public, meaning users don’t have to login/authenticate.
conda-forge is configured as a proxy channel and we have some local channels for the packages we create internally. Packages are built and uploaded via our gitlab-ci pipelines.
One thing I like with Quetz are the plugins. The quetz_repodata_patching plugin is something we need for one channel. I don’t think this is supported by Artifactory.

There isn’t a lot of activity on Quetz but the project is still active. If you have some missing features, I’d recommend to open an issue.

We have a lot of pure Python packages internally. To avoid mixing conda and pypi packages, we create conda packages in our gitlab-ci pipeline:

  • python -m build -s to create a sdist
  • grayskull to generate a conda recipe from that sdist
  • conda mambabuild to build the conda package

This works well for simple packages.

To come back on the conda server side. If you want something that works out of the box and don’t want to think about it, I’d recommend Artifactory.
Quetz works well for us but I had to spend more time for the deployment and even did a few contributions upstream.

1 Like

He @mbest,

You can use prefix.dev/channels! It’s free to use, give it a try :smiley:

We at prefix.dev saw the same problems you have. We wanted to make it easy for anyone to start and create their own channels, private or public.

Our CEO is one of the authors of Quetz, and the rest of us worked with/on Quetz. For self-hosting its a good option and works pretty well.
But that said, dealing with the hosting is not everyone’s favorite subject. So when we started prefix.dev we began with a complete Rust port of everything Conda, that is where we saw the opportunity to make a channel hosting server, that integrates with the opensource channels.

Currently prefix.dev/channels is our hosting platform for the private and public channels. It is much faster and we’re constantly developing on the back-end to make it even faster and more reliable. Let your colleagues login with GitHub, Google or prefix.dev accounts. We’re open to help you integrate it in your company and boost your efficiency so you can focus on the actual developing of your application or research.

The back-end is build on top of rattler check it out to see the active development being done there! Next to this back-end we’re also building on-top of rattler for our pixi tool. The goal of this tool is to be the poetry/cargo but for conda packages.

Hope this helps!

Ruben

1 Like

appreciate the response

this is a very similar workflow to what I envisioned/used while testing this out, thanks for describing out in detail. I am still tempted to use artifactory as it’s definitely more of an established product. I might give deploying quetz another try, I agree that the instructions on the deployment could be better. If I manage to set it up I will definitely write a blogpost on this, I would think that there are more people in having a setup like this in place, and I find it really surprising that quetz isn’t mentioned more often.

Hi Ruben, I had taken a look at your products previously, they look exactly like what I’ve been looking for in the conda ecosystem.
For privacy reasons, I will most likely stick with a self hosted option, otherwise I would’ve used your tool. I will definitely consider joining your packaging conference!

Thanks for your input.

1 Like

Nice to hear that you consider us! If you ever have second thoughts or would like to discuss more options. Let us know, our reaction time is the fastest through Discord! Or check our contact info on prefix.dev.