[go: nahoru, domu]

Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Refine Doc and Example (#3)
Browse files Browse the repository at this point in the history
Refine Doc and Example
  • Loading branch information
yqwang-ms committed Dec 17, 2018
1 parent 94a1680 commit 07c2a6c
Show file tree
Hide file tree
Showing 21 changed files with 306 additions and 123 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,8 @@ A Framework represents an application with a set of Tasks:
1. A Kubernetes cluster, v1.10 or above, on-cloud or on-premise.

## Quick Start
1. [Build](build/frameworkcontroller)
2. [Run Example](example/run/frameworkcontroller.md)
3. [Framework Example](example/framework)
1. [Run Controller](example/run)
2. [Submit Framework](example/framework)

## Doc
1. [User Manual](doc/user-manual.md)
Expand Down
50 changes: 33 additions & 17 deletions doc/user-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,37 @@
- [Best Practice](#BestPractice)

## <a name="FrameworkInterop">Framework Interop</a>
**Supported interoperations with a Framework**

### <a name="SupportedClient">Supported Client</a>
As Framework is actually a [Kubernetes CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions), all [CRD Clients](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#accessing-a-custom-resource) can be used to interoperate with it, such as:
1. [kubectl](https://kubernetes.io/docs/reference/kubectl)
```shell
kubectl create -f {Framework File Path}
# Note this is not Foreground Deletion, see [DELETE Framework] section
kubectl delete framework {FrameworkName}
kubectl get framework {FrameworkName}
kubectl describe framework {FrameworkName}
kubectl get frameworks
kubectl describe frameworks
...
```
2. [Kubernetes Client Library](https://kubernetes.io/docs/reference/using-api/client-libraries)
3. Any HTTP Client

### <a name="SupportedInteroperation">Supported Interoperation</a>
| API Kind | Operations |
|:---- |:---- |
| Framework | [CREATE](#CREATE_Framework) [DELETE](#DELETE_Framework) [GET](#GET_Framework) [LIST](#LIST_Frameworks) [WATCH](#WATCH_Framework) [WATCH_LIST](#WATCH_LIST_Frameworks) |
| [ConfigMap](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#configmap-v1-core) | All operations except for [CREATE](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#create-193) [PUT](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#replace-195) [PATCH](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#patch-194) |
| [Pod](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#pod-v1-core) | All operations except for [CREATE](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#create-55) [PUT](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#replace-57) [PATCH](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#patch-56) |

**Supported clients to execute the interoperations with a Framework**

As Framework is actually a Kubernetes [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions), all CRD clients can be used to execute the interoperations with a Framework, see them in [Accessing a custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#accessing-a-custom-resource).

### <a name="CREATE_Framework">CREATE Framework</a>
#### <a name="CREATE_Framework">CREATE Framework</a>
**Request**

POST /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks

Body: [Framework](../pkg/apis/frameworkcontroller/v1/types.go)

Type: application/json
Type: application/json or application/yaml

**Description**

Expand All @@ -44,26 +55,32 @@ Create the specified Framework.
| Accepted(202) | [Framework](../pkg/apis/frameworkcontroller/v1/types.go) | Return current Framework. |
| Conflict(409) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework already exists. |

### <a name="DELETE_Framework">DELETE Framework</a>
#### <a name="DELETE_Framework">DELETE Framework</a>
**Request**

DELETE /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks/{FrameworkName}

Body:

application/json
```json
{
"propagationPolicy": "Foreground"
}
```
application/yaml
```yaml
propagationPolicy: Foreground
```

Type: application/json
Type: application/json or application/yaml

**Description**

Delete the specified Framework.

Notes:
* Should always use and only use the provided body, see [Framework Notes](../pkg/apis/frameworkcontroller/v1/types.go).
* If you need to ensure at most one instance of a specific Framework (identified by the FrameworkName) is running at any point in time, you should always use and only use the [Foreground Deletion](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion) in the provided body, see [Framework Notes](../pkg/apis/frameworkcontroller/v1/types.go). However, `kubectl delete` does not support to specify the Foreground Deletion at least for [Kubernetes v1.10](https://github.com/kubernetes/kubernetes/issues/66110#issuecomment-413761559), so you may have to use other [Supported Client](#SupportedClient).

**Response**

Expand All @@ -73,7 +90,7 @@ Notes:
| OK(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is deleted. |
| NotFound(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is not found. |

### <a name="GET_Framework">GET Framework</a>
#### <a name="GET_Framework">GET Framework</a>
**Request**

GET /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks/{FrameworkName}
Expand All @@ -89,7 +106,7 @@ Get the specified Framework.
| OK(200) | [Framework](../pkg/apis/frameworkcontroller/v1/types.go) | Return current Framework. |
| NotFound(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is not found. |

### <a name="LIST_Frameworks">LIST Frameworks</a>
#### <a name="LIST_Frameworks">LIST Frameworks</a>
**Request**

GET /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks
Expand All @@ -107,7 +124,7 @@ Get all Frameworks (in the specified FrameworkNamespace).
|:---- |:---- |:---- |
| OK(200) | [FrameworkList](../pkg/apis/frameworkcontroller/v1/types.go) | Return all Frameworks (in the specified FrameworkNamespace). |

### <a name="WATCH_Framework">WATCH Framework</a>
#### <a name="WATCH_Framework">WATCH Framework</a>
**Request**

GET /apis/frameworkcontroller.microsoft.com/v1/watch/namespaces/{FrameworkNamespace}/frameworks/{FrameworkName}
Expand All @@ -125,7 +142,7 @@ Watch the change events of the specified Framework.
| OK(200) | [WatchEvent](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#watchevent-v1-meta) | Streaming the change events of the specified Framework. |
| NotFound(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is not found. |

### <a name="WATCH_LIST_Frameworks">WATCH_LIST Frameworks</a>
#### <a name="WATCH_LIST_Frameworks">WATCH_LIST Frameworks</a>
**Request**

GET /apis/frameworkcontroller.microsoft.com/v1/watch/namespaces/{FrameworkNamespace}/frameworks
Expand Down Expand Up @@ -305,8 +322,7 @@ Notes:
## <a name="ControllerExtension">Controller Extension</a>
### <a name="FrameworkBarrier">FrameworkBarrier</a>
1. [Usage](../pkg/barrier/barrier.go)
2. [Build](../build/frameworkbarrier)
3. Example: [FrameworkBarrier Example](../example/framework/extension/frameworkbarrier.yaml), [Tensorflow Example](../example/framework/scenario/tensorflow), [etc](../example/framework/scenario).
2. Example: [FrameworkBarrier Example](../example/framework/extension/frameworkbarrier.yaml), [TensorFlow Example](../example/framework/scenario/tensorflow), [etc](../example/framework/scenario).

## <a name="BestPractice">Best Practice</a>
[Best Practice](../pkg/apis/frameworkcontroller/v1/types.go)
10 changes: 0 additions & 10 deletions example/config/default/frameworkcontroller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,6 @@

# This is the default config for frameworkcontroller, so all settings are commented out.

# Setup k8s config:
# kubeApiServerAddress is default to ${KUBE_APISERVER_ADDRESS} and kubeConfigFilePath
# is default to ${KUBECONFIG} then falls back to ${HOME}/.kube/config.
# If both kubeApiServerAddress and kubeConfigFilePath after defaulting are still empty,
# falls back to k8s inClusterConfig.
#
# Address should be in format http[s]://host:port
#kubeApiServerAddress: http://10.10.10.10:8080
#
# See https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#config
#kubeConfigFilePath: ""

#workerNumber: 20
11 changes: 11 additions & 0 deletions example/framework/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Submit Framework
We provide various Framework examples that can be submitted by various clients:
1. [Framework Supported Client](../../doc/user-manual.md#SupportedClient)
2. Framework Example
1. [Basic Example](basic)
2. [FrameworkController Extension Example](extension)
3. [Real Scenario Example](scenario)

## Next
1. [Framework Interop](../../doc/user-manual.md#FrameworkInterop)
2. [Framework Usage](../../pkg/apis/frameworkcontroller/v1/types.go)
1 change: 0 additions & 1 deletion example/framework/basic/batchfailedpermanent.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/batchfailedtransient.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/batchfailedtransientconflict.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/batchfailedunknown.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/batchstatefulfailed.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/batchsucceeded.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/batchwithservicesucceeded.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/service.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
1 change: 0 additions & 1 deletion example/framework/basic/servicestateful.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
Expand Down
21 changes: 20 additions & 1 deletion example/framework/extension/frameworkbarrier.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
# For the full frameworkbarrier usage, see ./pkg/barrier/barrier.go

############################### Prerequisite ###################################
# See "[PREREQUISITE]" in this file.
################################################################################
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
metadata:
Expand Down Expand Up @@ -54,6 +57,19 @@ spec:
volumeMounts:
- name: frameworkbarrier-volume
mountPath: /mnt/frameworkbarrier
# [PREREQUISITE]
# User needs to create a service account in the same namespace of this
# Framework with granted permission for frameworkbarrier, if the k8s
# cluster enforces authorization.
# For example, if the cluster enforces RBAC:
# kubectl create serviceaccount frameworkbarrier --namespace default
# kubectl create clusterrole frameworkbarrier \
# --verb=get,list,watch \
# --resource=frameworks
# kubectl create clusterrolebinding frameworkbarrier \
# --clusterrole=frameworkbarrier \
# --user=system:serviceaccount:default:frameworkbarrier
serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
# Using official image to demonstrate this example.
Expand Down Expand Up @@ -97,6 +113,9 @@ spec:
volumeMounts:
- name: frameworkbarrier-volume
mountPath: /mnt/frameworkbarrier
# [PREREQUISITE]
# Same as server TaskRole.
serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
image: frameworkcontroller/frameworkbarrier
Expand Down
17 changes: 17 additions & 0 deletions example/framework/scenario/tensorflow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# TensorFlow On FrameworkController

## Feature
1. Support both GPU and CPU Distributed Training
2. Automatically clean up PS when the whole FrameworkAttempt is completed
3. No need to adjust existing TensorFlow image
4. No need to setup [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service) and [Kubernetes Service](https://kubernetes.io/docs/concepts/services-networking/service)
5. [Common Feature](../../../../README.md#Feature)

## Prerequisite
1. See `[PREREQUISITE]` in each specific Framework yaml file.
2. Need to setup [Kubernetes Cluster-Level Logging](https://kubernetes.io/docs/concepts/cluster-administration/logging), if you need to persist and expose the log for deleted Pod.

## Quick Start
1. [Common Quick Start](../../../../README.md#Quick-Start)
2. [CPU Example](cpu)
3. [GPU Example](gpu)
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
# For the full frameworkbarrier usage, see ./pkg/barrier/barrier.go

############################### Prerequisite ###################################
# See "[PREREQUISITE]" in this file.
################################################################################
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
metadata:
Expand All @@ -23,8 +26,15 @@ spec:
pod:
spec:
restartPolicy: Never
# Using hostNetwork to avoid network overhead.
hostNetwork: true
# [PREREQUISITE]
# User needs to setup the k8s cluster networking model and aware the
# potential network overhead, if he want to disable the hostNetwork to
# avoid the coordination of the containerPort usage.
# And for this example, if the hostNetwork is disabled, it only needs
# at least 1 node, otherwise, it needs at least 3 nodes since all the
# 3 workers are specified with the same containerPort.
# See https://kubernetes.io/docs/concepts/cluster-administration/networking
hostNetwork: false
containers:
- name: tensorflow
# Using official image to demonstrate this example.
Expand Down Expand Up @@ -56,6 +66,11 @@ spec:
mountPath: /mnt/frameworkbarrier
- name: data-volume
mountPath: /mnt/data
# [PREREQUISITE]
# User needs to create a service account for frameworkbarrier, if the
# k8s cluster enforces authorization.
# See more in ./example/framework/extension/frameworkbarrier.yaml
serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
# Using official image to demonstrate this example.
Expand All @@ -74,10 +89,12 @@ spec:
- name: frameworkbarrier-volume
emptyDir: {}
- name: data-volume
# [PREREQUISITE]
# User needs to specify his own data-volume for input data and
# output model and the data-volume must be a distributed shared
# file system, so that data can be "handed off" between Pods,
# such as nfs, cephfs or glusterfs, etc.
# output model.
# The data-volume must be a distributed shared file system, so that
# data can be "handed off" between Pods, such as nfs, cephfs or
# glusterfs, etc.
# See https://kubernetes.io/docs/concepts/storage/volumes.
#
# And then he needs to download and extract the example input data
Expand All @@ -103,7 +120,9 @@ spec:
pod:
spec:
restartPolicy: Never
hostNetwork: true
# [PREREQUISITE]
# Same as ps TaskRole.
hostNetwork: false
containers:
- name: tensorflow
image: frameworkcontroller/tensorflow-examples:cpu
Expand All @@ -125,6 +144,9 @@ spec:
mountPath: /mnt/frameworkbarrier
- name: data-volume
mountPath: /mnt/data
# [PREREQUISITE]
# Same as ps TaskRole.
serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
image: frameworkcontroller/frameworkbarrier
Expand All @@ -140,6 +162,8 @@ spec:
- name: frameworkbarrier-volume
emptyDir: {}
- name: data-volume
# [PREREQUISITE]
# Same as ps TaskRole.
#nfs:
# server: {NFS Server Host}
# path: {NFS Shared Directory}
Loading

0 comments on commit 07c2a6c

Please sign in to comment.