xxxxxxxxxx
This baseline notebook is designed to offer a starting point for the competiters. Please note that the approach we've taken is not *THE* solution — it's simply ONE possible approach. Our aim is to assist participants in exploring different ways to preprocess and model the data. Please feel free to fork the notebook and save the model/data for your own exploration.
This baseline notebook is designed to offer a starting point for the competiters. Please note that the approach we've taken is not THE solution — it's simply ONE possible approach. Our aim is to assist participants in exploring different ways to preprocess and model the data. Please feel free to fork the notebook and save the model/data for your own exploration.
这款基准笔记本旨在为参赛者提供一个起点。请注意,我们采取的方法并不是解决方案——它只是一种可能的方法。我们的目标是帮助参与者探索对数据进行预处理和建模的不同方法。请随意分叉笔记本并保存模型/数据以供您自己探索。
xxxxxxxxxx
This notebook was prepared by Virginie Batista and Angèle Syty from the Institut d'Astrophysique de Paris, and Orphée Faucoz from Centre National d’Etudes Spatiales (CNES), with support from Gordon Yip and Tara Tahseen from University College London.
This notebook was prepared by Virginie Batista and Angèle Syty from the Institut d'Astrophysique de Paris, and Orphée Faucoz from Centre National d’Etudes Spatiales (CNES), with support from Gordon Yip and Tara Tahseen from University College London.
本笔记本由巴黎天体物理学研究所的 Virginie Batista 和 Angèle Syty 以及国家空间研究中心 (CNES) 的 Orphée Faucoz 编写,并得到了伦敦大学学院的 Gordon Yip 和 Tara Tahseen 的支持。
xxxxxxxxxx
# READ THIS BEFORE YOU PROCEED
This training procedure uses the light dataset produced from this [notebook (Version 5)](https://www.kaggle.com/code/gordonyip/update-calibrating-and-binning-astronomical-data). We applied all the calibration steps EXCEPT Linearity Correction with Chunksize = 1. The binned dataset is available to download [here](https://www.kaggle.com/datasets/gordonyip/binned-dataset-v3/data). *If you want to carry out all the correction, you will have to do so yourself.*
**This notebook will only provide the model checkpoints, you are welcomed to use these checkpoints with your own script and submit to the leaderboard.**
This training procedure uses the light dataset produced from this notebook (Version 5). We applied all the calibration steps EXCEPT Linearity Correction with Chunksize = 1. The binned dataset is available to download here. If you want to carry out all the correction, you will have to do so yourself.
此训练过程使用从此笔记本(版本 5)生成的光数据集。我们应用了除块大小 = 1 时的线性校正之外的所有校准步骤。分箱数据集可在此处下载。如果您想执行所有更正,则必须亲自执行。
This notebook will only provide the model checkpoints, you are welcomed to use these checkpoints with your own script and submit to the leaderboard.
本笔记本仅提供模型检查点,欢迎您将这些检查点与您自己的脚本一起使用并提交到排行榜。
xxxxxxxxxx
The challenge's primary objective is to process these exposures to produce a single, clean spectrum for each exoplanet, summarizing the rp/rs values across all wavelengths.
The exposure are subject to noises and the images or spectrum are not perfect. The Jitter noise has a complex signature that the ML model should recognize to produce a better spectra.
Different techniques are possible and are up to the participant imagination to produce a novel (and hopefully better) solution to this task.
Here outline our baseline approach :
We first fit a 1D CNN to fit the mean value of the transmission spectra, taking as input the transit white curve (total flux of each image taken as a function of time).
For the second part of the baseline, to retrieve the atmopsheric features, we make the data lighter by summing up the fluxes along the y-axis, for each wavelength, resulting in 2D images of dimension (N_times, N_wavelengths). We also cut the signal to remove the out of transit in order to enhance transit depth variations between wavelengths. For the same reason, we substract the mean flux, corresponding to the average transit depth, to keep only wavelength variations around this mean. We use a 2D CNN to fit the atmospheric features.
The challenge's primary objective is to process these exposures to produce a single, clean spectrum for each exoplanet, summarizing the rp/rs values across all wavelengths.
该挑战的主要目标是处理这些曝光,为每个系外行星生成单一、干净的光谱,总结所有波长的 rp/rs 值。
The exposure are subject to noises and the images or spectrum are not perfect. The Jitter noise has a complex signature that the ML model should recognize to produce a better spectra.
曝光会受到噪声的影响,图像或光谱并不完美。抖动噪声具有复杂的特征,ML 模型应识别该特征以产生更好的频谱。
Different techniques are possible and are up to the participant imagination to produce a novel (and hopefully better) solution to this task.
不同的技术是可能的,并且取决于参与者的想象力来为该任务产生新颖的(并且希望是更好的)解决方案。
Here outline our baseline approach :
这里概述了我们的基线方法:
We first fit a 1D CNN to fit the mean value of the transmission spectra, taking as input the transit white curve (total flux of each image taken as a function of time).
我们首先拟合一维 CNN 来拟合透射光谱的平均值,将传输白曲线(每幅图像的总通量作为时间的函数)作为输入。
For the second part of the baseline, to retrieve the atmopsheric features, we make the data lighter by summing up the fluxes along the y-axis, for each wavelength, resulting in 2D images of dimension (N_times, N_wavelengths). We also cut the signal to remove the out of transit in order to enhance transit depth variations between wavelengths. For the same reason, we substract the mean flux, corresponding to the average transit depth, to keep only wavelength variations around this mean. We use a 2D CNN to fit the atmospheric features.
对于基线的第二部分,为了检索大气特征,我们通过对每个波长沿 y 轴的通量求和来使数据更轻,从而产生尺寸为(N_times,N_wavelengths)的二维图像。我们还切割信号以消除传输之外的信号,以增强波长之间的传输深度变化。出于同样的原因,我们减去与平均传输深度相对应的平均通量,以仅保留该平均值周围的波长变化。我们使用 2D CNN 来拟合大气特征。
xxxxxxxxxx
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
img = mpimg.imread('/kaggle/input/baseline-img/2nd_baseline.png')
plt.figure(figsize=(10, 15))
plt.imshow(img)
plt.axis('off')
plt.show()
xxxxxxxxxx
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import load_model
import tensorflow as tf
import random
import os
from tensorflow.keras.losses import MeanAbsoluteError
from matplotlib.ticker import ScalarFormatter
import pandas as pd