
Nomic Embed: Training a Reproducible Long Context Text Embedder
Feb 26, 2025 · This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding …
Abstract The Machine Learning (ML) community has wit-nessed explosive growth, with millions of ML models being published on the Web. Reusing ML model components has been prevalent nowadays. …
Chaos as an interpretable benchmark for forecasting and data …
Oct 11, 2021 · We present a curated collection of chaotic dynamical systems for benchmarking and interpreting forecasting and data-driven modelling, which can be re-integrated to generate new …
Evaluating Open-QA Evaluation | OpenReview
Sep 26, 2023 · This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs). Current automatic …
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with …
Sep 17, 2022 · We introduce AutoWS-Bench-101: a benchmarking framework for automated weak supervision techniques on diverse tasks.
NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep...
Sep 25, 2023 · Recently, the Deep Learning community has become interested in evolutionary optimization (EO) as a means to address hard optimization problems, e.g. meta-learning through …
NeoRL: A Near Real-World Benchmark for Offline Reinforcement …
Sep 16, 2022 · NeoRL presents conservative datasets for offline RL, highlights the complete pipeline for deploying offline RL in real-world applications, and also benchmarks recent offline RL algorithms on …
Signatory: differentiable computations of the signature and...
Jan 12, 2021 · Signatory is a library for calculating and performing functionality related to the signature and logsignature transforms. The focus is on machine learning, and as such includes features such …
BigBio: A Framework for Data-Centric Biomedical Natural Language...
Sep 16, 2022 · BigBio is a community library of 126+ biomedical NLP datasets, covering 13 tasks and 10 languages.
143 142 2.1 Machine Learning Project Licensing 145 144 Typically, a ML project is constructed with data, software and mod-146 els, which are usually governed by diferent licensing frameworks. 147 …