这是用户在 2024-10-30 10:38 为 https://huggingface.co/datasets/fairchem/OMAT24#-train- 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Datasets: 数据集:

ArXiv:
License: 执照:

The dataset is currently empty. Upload or create new data files. Then, you will be able to explore them in the Dataset Viewer.
数据集当前为空。上传或创建新的数据文件。然后,您将能够在数据集查看器中探索它们。

Meta Open Materials 2024 (OMat24) Dataset
元开放材料 2024 (OMat24) 数据集

Overview  概述

Several datasets were utilized in this work. We provide open access to all datasets used to help accelerate research in the community. This includes the OMat24 dataset as well as our modified sAlex dataset. Details on the different datasets are provided below.
这项工作中使用了几个数据集。我们提供对所有数据集的开放访问,以帮助加速社区研究。这包括 OMat24 数据集以及我们修改后的 sAlex 数据集。下面提供了不同数据集的详细信息。

Datasets  数据集

OMat24 Dataset  OMat24 数据集

The OMat24 dataset contains a mix of single point calculations of non-equilibrium structures and structural relaxations. The dataset contains structures labeled with total energy (eV), forces (eV/A) and stress (eV/A^3). The dataset is provided in ASE DB compatible lmdb files.
OMat24 数据集包含非平衡结构和结构松弛的单点计算的组合。该数据集包含标有总能量 (eV)、力 (eV/A) 和应力 (eV/A^3) 的结构。数据集以 ASE DB 兼容的 lmdb 文件形式提供。

We provide two splits - train and validation. Each split is comprised of several subdatasets based on the different input generation strategies, see paper for more details.
我们提供两个分割——训练和验证。每个分割都由几个基于不同输入生成策略的子数据集组成,有关更多详细信息,请参阅论文。

The OMat24 train and validation splits are fully compatible with the Matbench Discovery benchmark test set.
OMat24 训练和验证拆分与 Matbench Discovery 基准测试集完全兼容。

  1. The splits do not contain any structure that has a protostructure label present in the initial or relaxed structures of the WBM dataset.
    分割不包含任何具有 WBM 数据集的初始或宽松结构中存在的原型结构标签的结构。
  2. The splits do not include any structure that was generated starting from an Alexandria relaxed structure with protostructure lable in the intitial or relaxed structures of the WBM datset.
    分割不包括从 Alexandria 松弛结构开始生成的任何结构,其原型结构标签位于 WBM 数据集的初始或松弛结构中。
Train  火车
Sub-dataset Size Download
rattled-1000 惊慌失措-1000 11,388,510 rattled-1000.tar.gz 嘎嘎-1000.tar.gz
rattled-1000-subsampled 嘎嘎作响的 1000 次采样 3,879,741 rattled-1000-subsampled.tar.gz
嘎嘎-1000-subsampled.tar.gz
rattled-500 惊慌失措-500 6,922,197 rattled-500.tar.gz 嘎嘎-500.tar.gz
rattled-500-subsampled 嘎嘎作响的 500 次采样 3,975,416 rattled-500-subsampled.tar.gz
嘎嘎-500-subsampled.tar.gz
rattled-300 嘎嘎-300 6,319,139 rattled-300.tar.gz 嘎嘎-300.tar.gz
rattled-300-subsampled 嘎嘎作响的 300 次采样 3,464,007 rattled-300-subsampled.tar.gz
aimd-from-PBE-1000-npt 目标来自-PBE-1000-NPT 21,269,486 aimd-from-PBE-1000-npt.tar.gz
目标-来自-PBE-1000-npt.tar.gz
aimd-from-PBE-1000-nvt 目标来自-PBE-1000-nvt 20,256,650 aimd-from-PBE-1000-nvt.tar.gz
目标-来自-PBE-1000-nvt.tar.gz
aimd-from-PBE-3000-npt 目标来自 PBE-3000-NPT 6,076,290 aimd-from-PBE-3000-npt.tar.gz
目标-来自-PBE-3000-npt.tar.gz
aimd-from-PBE-3000-nvt 目标来自-PBE-3000-nvt 7,839,846 aimd-from-PBE-3000-nvt.tar.gz
目标-来自-PBE-3000-nvt.tar.gz
rattled-relax 紧张放松 9,433,303 rattled-relax.tar.gz 嘎嘎放松.tar.gz
Total 全部的 100,824,585 -
Validation  验证

Models were evaluated on a ~1M subset for training efficiency. We provide that set below.
在约 1M 子集上评估模型的训练效率。我们在下面提供了该集合。

Sub-dataset Size Download
rattled-1000 惊慌失措-1000 122,937 rattled-1000.tar.gz 嘎嘎-1000.tar.gz
rattled-1000-subsampled 嘎嘎作响的 1000 次采样 41,786 rattled-1000-subsampled.tar.gz
嘎嘎-1000-subsampled.tar.gz
rattled-500 惊慌失措-500 75,167 rattled-500.tar.gz 嘎嘎-500.tar.gz
rattled-500-subsampled 嘎嘎作响的 500 次采样 43,068 rattled-500-subsampled.tar.gz
嘎嘎-500-subsampled.tar.gz
rattled-300 嘎嘎-300 68,593 rattled-300.tar.gz 嘎嘎-300.tar.gz
rattled-300-subsampled 嘎嘎作响的 300 次采样 37,393 rattled-300-subsampled.tar.gz
aimd-from-PBE-1000-npt 目标来自-PBE-1000-NPT 223,574 aimd-from-PBE-1000-npt.tar.gz
目标-来自-PBE-1000-npt.tar.gz
aimd-from-PBE-1000-nvt 目标来自-PBE-1000-nvt 215,589 aimd-from-PBE-1000-nvt.tar.gz
目标-来自-PBE-1000-nvt.tar.gz
aimd-from-PBE-3000-npt 目标来自 PBE-3000-NPT 65,244 aimd-from-PBE-3000-npt.tar.gz
目标-来自-PBE-3000-npt.tar.gz
aimd-from-PBE-3000-nvt 目标来自-PBE-3000-nvt 84,063 aimd-from-PBE-3000-nvt.tar.gz
目标-来自-PBE-3000-nvt.tar.gz
rattled-relax 紧张放松 99,968 rattled-relax.tar.gz 嘎嘎放松.tar.gz
Total 全部的 1,077,382 -

sAlex Dataset  sAlex 数据集

We also provide the sAlex dataset used for fine-tuning of our OMat models. sAlex is a subsampled, Matbench-Discovery compliant, version of the original Alexandria. sAlex was created by removing structures matched in WBM and only sampling structure along a trajectory with an energy difference greater than 10 meV/atom. For full details, please see the manuscript.
我们还提供了用于微调 OMat 模型的 sAlex 数据集。 sAlex 是原始Alexandria的子采样版本,符合 Matbench-Discovery 标准。 sAlex 是通过去除 WBM 中匹配的结构并仅沿能量差大于 10 meV/原子的轨迹对结构进行采样而创建的。有关完整详细信息,请参阅手稿。

Dataset Split Size Download
sAlex 萨莱克斯 train 火车 10,447,765 train.tar.gz 火车.tar.gz
sAlex 萨莱克斯 val 瓦尔 553,218 val.tar.gz

How to read the data
如何读取数据

The OMat24 and sAlex datasets can be accessed with the fairchem library. This package can be installed with:
OMat24 和 sAlex 数据集可以通过fairchem库访问。该软件包可以通过以下方式安装:

pip install fairchem-core

Dataset files are written as AseLMDBDatabase objects which are an implementation of an ASE Database, in LMDB format. A single **.aselmdb* file can be read and queried like any other ASE DB (not recommended as there are many files!).
数据集文件被编写为AseLMDBDatabase对象,它是ASE Database的实现,采用 LMDB 格式。可以像任何其他 ASE DB 一样读取和查询单个 **.aselmdb* 文件(不推荐,因为文件很多!)。

You can also read many DB files at once and access atoms objects using the AseDBDataset class.
您还可以一次读取多个数据库文件并使用AseDBDataset类访问原子对象。

For example to read the rattled-relax subdataset,
例如,要读取rattled-relax子数据集,

from fairchem.core.datasets import AseDBDataset

dataset_path = "/path/to/omat24/train/rattled-relax"
config_kwargs = {} # see tutorial on additional configuration

dataset = AseDBDataset(config=dict(src=dataset_path, **config_kwargs))

# atoms objects can be retrieved by index
atoms = dataset.get_atoms(0)

To read more than one subdataset you can simply pass a list of subdataset paths,
要读取多个子数据集,您只需传递子数据集路径列表即可,

from fairchem.core.datasets import AseDBDataset

config_kwargs = {} # see tutorial on additional configuration
dataset_paths = [
      "/path/to/omat24/train/rattled-relax",
      "/path/to/omat24/train/rattled-1000-subsampled",
      "/path/to/omat24/train/rattled-1000"
]
dataset = AseDBDataset(config=dict(src=dataset_paths, **config_kwargs))

To read all of the OMat24 training or validations splits simply pass the paths to all subdatasets.
要读取所有 OMat24 训练或验证分割,只需将路径传递到所有子数据集即可。

Support  支持

If you run into any issues regarding feel free to post your questions or comments on any of the following platforms:
如果您遇到任何问题,请随时在以下任何平台上发布您的问题或评论:

Citation  引文

The OMat24 dataset is licensed under a Creative Commons Attribution 4.0 License. If you use this work, please cite:
OMat24 数据集根据Creative Commons Attribution 4.0 License获得许可。如果您使用本作品,请引用:

@misc{barroso_omat24,
      title={Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models}, 
      author={Luis Barroso-Luque and Muhammed Shuaibi and Xiang Fu and Brandon M. Wood and Misko Dzamba and Meng Gao and Ammar Rizvi and C. Lawrence Zitnick and Zachary W. Ulissi},
      year={2024},
      eprint={2410.12771},
      archivePrefix={arXiv},
      primaryClass={cond-mat.mtrl-sci},
      url={https://arxiv.org/abs/2410.12771}, 
}

### We hope to move our datasets and models to the Hugging Face Hub in the near future to make it more accessible by the community. ###
### 我们希望在不久的将来将我们的数据集和模型转移到 Hugging Face Hub,以便社区更容易访问。 ###

Downloads last month 上个月的下载量
43

Space using  空间利用fairchem/OMAT24 1