Dataset of Tropical Cyclone for Image-to-intensity Regression (TCIR)
Boyo Chen, BuoFu Chen and Hsuan-Tien Lin
TCIR collects Tropical Cyclone (TC) data from 4 channels of satellite images. TCIR aims to act as a benchmark dataset to help data scientists fairly evaluate the performance of TC intensity prediction models.
Data Statistics
Region |
#TCs |
#Frames |
Atlantic |
235 |
13707 |
West Pacific |
379 |
20061 |
East Pacific |
247 |
13615 |
Central Pacific |
19 |
1479 |
Indian Ocean |
75 |
3205 |
Southern Hemisphere |
330 |
18434 |
Total |
1285 |
70501 |
Frames
- 4 channels:
- IR1: Infrared.
- WV: Water vapor.
- VIS: Visible. Noted that visible channel is very unstable because of the daylight.
- PMW: Passive microwave.
- Frame size
- Tropical cyclone’s center is placed in the middle.
- A radius of 7 degrees in both latitude and longitude.
- 201 x 201 data point
- Distance between two data points = 14 degree/200 = about 4 Km
- Resolution: 7/100 degree lat/lon
- There exist some missing value, now filled with NaN. It is suggested that the values be handled properly before training, such as:
- Interpolation.
- Replace by zeros.
- etc.
- The original resolution of the PMW channel from CMORPH is 1/4 degree lat/lon. To unify the size of all 4 channels, we scale PMW channel about 4 times larger by linear interpolation.
Sources
Satellite observations comprising TCIR are collected from two open sources:
- GridSat: a long-term
dataset of global infrared window brightness temperatures,
including three channels: IR1, WV, and VIS. This dataset
includes data from most meteorological geostationary satellites
every three hours since 1981. The resolution is 7/100
degree lat/lon.
- CMORPH:
CMORPH precipitation
rates from 2003 to 2016 were included into TCIR.
CMORPH provides global precipitation analyses at relatively
high spatial and temporal resolution, which uses precipitation
estimates derived from low orbit microwave satellite
observations exclusively and whose features are transported
via spatial propagation information obtained entirely from
geostationary satellite IR1 data. The resolution of CMORPH
is 0.25-degree every three hours.
Labels
We used the best-tracks from Joint Typhoon Warning Center (JTWC) for TCs in western North Pacific (WP); the best-tracks from the revised Atlantic hurricane database (HURDAT2) for TCs in eastern North Pacific (EP) and Atlantic Ocean (AL) from 2003 to 2016.
The TC information provided in TCIR includes:
- intensity, (i.e., the maximum sustained wind, in knots, the main target we want to learn)
- size (i.e., the mean of radii of 35-knot wind in the four quadrants, in nmi).
- minimum sea-level pressure
- TC center location
Note that these values are tuned and finalized afterward based on all observation that is available. Thus, they are very different from the real-time estimations in meaning. While the best-track information can be taken as ground
truth, they are still some "estimation" in nature and can
suffer from some inherent noise.
In addition to the intensity, we also provided another remarkable TC structure parameter, the size. TC size is closely related to the impacts on the economy/society from a TC. We encourage the community to tackle the TC size prediction task
Usage
We provide an HDF5 format file for people to easily access the whole organized dataset.
- Dependencies: Python, pandas, numpy, HDF5 packages(such as "h5py").
- Link: Here
- Six regions are divided into 2 files:
- ATLN/EPAC/WPAC (described in the original paper, see "How to cite TCIR")
- CPAC/IO/SH (note that TCs from SH, comparing to other regions, rotate in a different direction.)
- There are 2 keys in the HDF5:
- matrix: N x 201 x 201 x 4 HDF5 dataset. One can load this with python numpy.
- info: HDF5 group. One can load this with python package pandas.
Example: Loading TCIR dataset with python.
import numpy as np
import pandas as pd
import h5py
data_path = "TCIR.h5"
# load "info" as pandas dataframe
data_info = pd.read_hdf(data_path, key="info", mode='r')
# load "matrix" as numpy ndarray, this could take longer times
with h5py.File(data_path, 'r') as hf:
data_matrix = hf['matrix'][:]
How to cite TCIR?
Please cite the following paper:
Boyo Chen, Buo-Fu Chen, and Hsuan-Tien Lin. Rotation-blended CNNs on a new open dataset for tropical cyclone image-to-intensity regression. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2018.
Last updated at CST 13:08, October 04, 2023
Please feel free to contact me:
|
|