{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "pL--_KGdYoBz" }, "source": [ "##### Copyright 2019 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2024-01-11T17:51:26.618914Z", "iopub.status.busy": "2024-01-11T17:51:26.618690Z", "iopub.status.idle": "2024-01-11T17:51:26.622608Z", "shell.execute_reply": "2024-01-11T17:51:26.621922Z" }, "id": "uBDvXpYzYnGj" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "HQzaEQuJiW_d" }, "source": [ "# TFRecords と tf.train.Example\n", "\n", "
![]() | \n",
" ![]() | \n",
" ![]() | \n",
" ![]() | \n",
"
tf.io
モジュールを参照してください。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "y-Hjmee-fbLH"
},
"source": [
"## `tf.data` を使った TFRecord ファイル"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GmehkCCT81Ez"
},
"source": [
"`tf.data` モジュールには、TensorFlow でデータを読み書きするツールも含まれています。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1FISEuz8ubu3"
},
"source": [
"### TFRecord ファイルを書き込む\n",
"\n",
"データをデータセットに書き込む最も簡単な方法は `from_tensor_slices` メソッドを使用する方法です。\n",
"\n",
"配列に適用すると、このメソッドはスカラー値のデータセットを返します。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-11T17:51:31.196954Z",
"iopub.status.busy": "2024-01-11T17:51:31.196421Z",
"iopub.status.idle": "2024-01-11T17:51:31.205197Z",
"shell.execute_reply": "2024-01-11T17:51:31.204595Z"
},
"id": "mXeaukvwu5_-"
},
"outputs": [
{
"data": {
"text/plain": [
"<_TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tf.data.Dataset.from_tensor_slices(feature1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f-q0VKyZvcad"
},
"source": [
"配列のタプルに適用すると、タプルのデータセットを返します。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-11T17:51:31.208406Z",
"iopub.status.busy": "2024-01-11T17:51:31.208040Z",
"iopub.status.idle": "2024-01-11T17:51:31.218691Z",
"shell.execute_reply": "2024-01-11T17:51:31.218100Z"
},
"id": "H5sWyu1kxnvg"
},
"outputs": [
{
"data": {
"text/plain": [
"<_TensorSliceDataset element_spec=(TensorSpec(shape=(), dtype=tf.bool, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None), TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.float64, name=None))>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"features_dataset = tf.data.Dataset.from_tensor_slices((feature0, feature1, feature2, feature3))\n",
"features_dataset"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-11T17:51:31.221562Z",
"iopub.status.busy": "2024-01-11T17:51:31.221340Z",
"iopub.status.idle": "2024-01-11T17:51:31.236806Z",
"shell.execute_reply": "2024-01-11T17:51:31.236232Z"
},
"id": "m1C-t71Nywze"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tf.Tensor(False, shape=(), dtype=bool)\n",
"tf.Tensor(1, shape=(), dtype=int64)\n",
"tf.Tensor(b'dog', shape=(), dtype=string)\n",
"tf.Tensor(0.23695644491000212, shape=(), dtype=float64)\n"
]
}
],
"source": [
"# Use `take(1)` to only pull one example from the dataset.\n",
"for f0,f1,f2,f3 in features_dataset.take(1):\n",
" print(f0)\n",
" print(f1)\n",
" print(f2)\n",
" print(f3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mhIe63awyZYd"
},
"source": [
"`Dataset` のそれぞれの要素に関数を適用するには、`tf.data.Dataset.map` メソッドを使用します。\n",
"\n",
"マップされる関数は TensorFlow のグラフモードで動作する必要があり、`tf.Tensors` を処理して返す必要があります。`serialize_example` のような非テンソル関数は、互換性を得るために `tf.py_function` で囲むことができます。\n",
"\n",
"`tf.py_function` を使用する際は、形状と型を指定する必要があります。指定しない場合、形状と型を利用できません。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-11T17:51:31.240268Z",
"iopub.status.busy": "2024-01-11T17:51:31.239792Z",
"iopub.status.idle": "2024-01-11T17:51:31.243376Z",
"shell.execute_reply": "2024-01-11T17:51:31.242724Z"
},
"id": "apB5KYrJzjPI"
},
"outputs": [],
"source": [
"def tf_serialize_example(f0,f1,f2,f3):\n",
" tf_string = tf.py_function(\n",
" serialize_example,\n",
" (f0, f1, f2, f3), # Pass these args to the above function.\n",
" tf.string) # The return type is `tf.string`.\n",
" return tf.reshape(tf_string, ()) # The result is a scalar."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-11T17:51:31.246615Z",
"iopub.status.busy": "2024-01-11T17:51:31.246020Z",
"iopub.status.idle": "2024-01-11T17:51:31.255679Z",
"shell.execute_reply": "2024-01-11T17:51:31.255126Z"
},
"id": "lHFjW4u4Npz9"
},
"outputs": [
{
"data": {
"text/plain": [
"