File size: 8,105 Bytes
0eb79a8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 | {
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Batch Prediction\n",
"\n",
"## 1. Download demo data\n",
"\n",
"```\n",
"cd PhaseNet\n",
"wget https://github.com/wayneweiqiang/PhaseNet/releases/download/test_data/test_data.zip\n",
"unzip test_data.zip\n",
"```\n",
"\n",
"## 2. Run batch prediction \n",
"\n",
"PhaseNet currently supports four data formats: mseed, sac, hdf5, and numpy. \n",
"\n",
"- For mseed format:\n",
"```\n",
"python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed.csv --data_dir=test_data/mseed --format=mseed --plot_figure\n",
"```\n",
"\n",
"- For sac format:\n",
"```\n",
"python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/sac.csv --data_dir=test_data/sac --format=sac --plot_figure\n",
"```\n",
"\n",
"- For numpy format:\n",
"```\n",
"python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/npz.csv --data_dir=test_data/npz --format=numpy --plot_figure\n",
"```\n",
"\n",
"- For hdf5 format:\n",
"```\n",
"python phasenet/predict.py --model=model/190703-214543 --hdf5_file=test_data/data.h5 --hdf5_group=data --format=hdf5 --plot_figure\n",
"```\n",
"\n",
"- For a seismic array (used by [QuakeFlow](https://github.com/wayneweiqiang/QuakeFlow)):\n",
"```\n",
"python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed_array.csv --data_dir=test_data/mseed_array --stations=test_data/stations.json --format=mseed_array --amplitude\n",
"```\n",
"```\n",
"python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed2.csv --data_dir=test_data/mseed --stations=test_data/stations.json --format=mseed_array --amplitude\n",
"```\n",
"\n",
"Notes: \n",
"1. Remove the \"--plot_figure\" argument for large datasets, because plotting can be very slow.\n",
"\n",
"Optional arguments:\n",
"```\n",
"usage: predict.py [-h] [--batch_size BATCH_SIZE] [--model_dir MODEL_DIR]\n",
" [--data_dir DATA_DIR] [--data_list DATA_LIST]\n",
" [--hdf5_file HDF5_FILE] [--hdf5_group HDF5_GROUP]\n",
" [--result_dir RESULT_DIR] [--result_fname RESULT_FNAME]\n",
" [--min_p_prob MIN_P_PROB] [--min_s_prob MIN_S_PROB]\n",
" [--mpd MPD] [--amplitude] [--format FORMAT]\n",
" [--s3_url S3_URL] [--stations STATIONS] [--plot_figure]\n",
" [--save_prob]\n",
"\n",
"optional arguments:\n",
" -h, --help show this help message and exit\n",
" --batch_size BATCH_SIZE\n",
" batch size\n",
" --model_dir MODEL_DIR\n",
" Checkpoint directory (default: None)\n",
" --data_dir DATA_DIR Input file directory\n",
" --data_list DATA_LIST\n",
" Input csv file\n",
" --hdf5_file HDF5_FILE\n",
" Input hdf5 file\n",
" --hdf5_group HDF5_GROUP\n",
" data group name in hdf5 file\n",
" --result_dir RESULT_DIR\n",
" Output directory\n",
" --result_fname RESULT_FNAME\n",
" Output file\n",
" --min_p_prob MIN_P_PROB\n",
" Probability threshold for P pick\n",
" --min_s_prob MIN_S_PROB\n",
" Probability threshold for S pick\n",
" --mpd MPD Minimum peak distance\n",
" --amplitude if return amplitude value\n",
" --format FORMAT input format\n",
" --stations STATIONS seismic station info\n",
" --plot_figure If plot figure for test\n",
" --save_prob If save result for test\n",
"```\n",
"\n",
"## 3. Output picks\n",
"- The output picks are saved to \"results/picks.csv\" on default\n",
"\n",
"|file_name |begin_time |station_id|phase_index|phase_time |phase_score|phase_amp |phase_type|\n",
"|-----------------|-----------------------|----------|-----------|-----------------------|-----------|----------------------|----------|\n",
"|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.BOM..HH|14734 |2020-10-01T00:02:27.343|0.708 |2.4998866231208325e-14|P |\n",
"|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.BOM..HH|15487 |2020-10-01T00:02:34.873|0.416 |2.4998866231208325e-14|S |\n",
"|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.COA..HH|319 |2020-10-01T00:00:03.193|0.762 |3.708662269972206e-14 |P |\n",
"\n",
"Notes:\n",
"1. The *phase_index* means which data point is the pick in the original sequence. So *phase_time* = *begin_time* + *phase_index* / *sampling rate*. The default *sampling_rate* is 100Hz \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Read P/S picks\n",
"\n",
"PhaseNet currently outputs two format: **CSV** and **JSON**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import json\n",
"import os\n",
"PROJECT_ROOT = os.path.realpath(os.path.join(os.path.abspath(''), \"..\"))"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"fname NC.MCV..EH.0361339.npz\n",
"t0 1970-01-01T00:00:00.000\n",
"p_idx [5999, 9015]\n",
"p_prob [0.987, 0.981]\n",
"s_idx [6181, 9205]\n",
"s_prob [0.553, 0.873]\n",
"Name: 1, dtype: object\n",
"fname NN.LHV..EH.0384064.npz\n",
"t0 1970-01-01T00:00:00.000\n",
"p_idx []\n",
"p_prob []\n",
"s_idx []\n",
"s_prob []\n",
"Name: 0, dtype: object\n"
]
}
],
"source": [
"picks_csv = pd.read_csv(os.path.join(PROJECT_ROOT, \"results/picks.csv\"), sep=\"\\t\")\n",
"picks_csv.loc[:, 'p_idx'] = picks_csv[\"p_idx\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
"picks_csv.loc[:, 'p_prob'] = picks_csv[\"p_prob\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
"picks_csv.loc[:, 's_idx'] = picks_csv[\"s_idx\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
"picks_csv.loc[:, 's_prob'] = picks_csv[\"s_prob\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
"print(picks_csv.iloc[1])\n",
"print(picks_csv.iloc[0])"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'id': 'NC.MCV..EH.0361339.npz', 'timestamp': '1970-01-01T00:01:30.150', 'prob': 0.9811667799949646, 'type': 'p'}\n",
"{'id': 'NC.MCV..EH.0361339.npz', 'timestamp': '1970-01-01T00:00:59.990', 'prob': 0.9872905611991882, 'type': 'p'}\n"
]
}
],
"source": [
"with open(os.path.join(PROJECT_ROOT, \"results/picks.json\")) as fp:\n",
" picks_json = json.load(fp) \n",
"print(picks_json[1])\n",
"print(picks_json[0])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.4 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|