py: add python sdk

- Bumping versions to 0.9.3 - Added autogenerated files from protoc and shim layer - Added tests for client and TLS - Added examples and usage for the python sdk in the documentation. Co-Authored-By: Julie Sheffield <julie@cobaltspeech.com>
cobaltspeech · Feb 6, 2020 · 14761b1 · 14761b1
1 parent ae15cd9
commit 14761b1
Show file tree

Hide file tree

Showing 22 changed files with 1,837 additions and 25 deletions.
diff --git a/Makefile b/Makefile
@@ -11,11 +11,12 @@ TOP := $(shell pwd)
 DEPSBIN := ${TOP}/deps/bin
 DEPSGO := ${TOP}/deps/go
 DEPSTMP := ${TOP}/deps/tmp
+DEPSVENV := ${TOP}/deps/venv
 $(shell mkdir -p $(DEPSBIN) $(DEPSGO) $(DEPSTMP))
 
 export PATH := ${DEPSBIN}:${DEPSGO}/bin:$(PATH)
 
-deps: deps-protoc deps-hugo deps-gendoc deps-gengo deps-gengateway deps-dotnet
+deps: deps-protoc deps-hugo deps-gendoc deps-gengo deps-gengateway deps-dotnet deps-py
 
 deps-protoc: ${DEPSBIN}/protoc
 ${DEPSBIN}/protoc:
@@ -49,8 +50,15 @@ ${DEPSBIN}/dotnet:
 		"https://download.visualstudio.microsoft.com/download/pr/d731f991-8e68-4c7c-8ea0-fad5605b077a/49497b5420eecbd905158d86d738af64/dotnet-sdk-3.1.100-linux-x64.tar.gz"
 	cd ${DEPSBIN} && tar -C ./ -xzvf dotnet-sdk-3.1.100-linux-x64.tar.gz
 
+deps-py: ${DEPSVENV}/.done
+${DEPSVENV}/.done:
+	virtualenv -p python3 ${DEPSVENV}
+	source ${DEPSVENV}/bin/activate && pip install grpcio-tools==1.20.0 googleapis-common-protos==1.5.9 && deactivate
+	touch $@
+
 gen: deps
-	@ PROTOINC=${DEPSGO}/pkg/mod/github.com/grpc-ecosystem/grpc-gateway@v1.9.0/third_party/googleapis \
+	@ source ${DEPSVENV}/bin/activate && \
+		PROTOINC=${DEPSGO}/pkg/mod/github.com/grpc-ecosystem/grpc-gateway@v1.9.0/third_party/googleapis \
 		$(MAKE) -C grpc
 	@ pushd docs-src && hugo -d ../docs && popd
 

diff --git a/README.md b/README.md
@@ -21,12 +21,15 @@ repository.
 Code generation has the following dependencies:
   - The protobuf compiler itself (protoc)
   - The protobuf documentation generation plugin (protoc-gen-doc)
+  - The python plugins (grpcio-tools and googleapis-common-protos)
   - The golang plugins (protoc-gen-go and protoc-gen-grpc-gateway)
   - The static website generator (hugo)
 
 A few system dependencies are required:
   - Go >= 1.12
   - git
+  - python3
+  - virtualenv
   - wget
 
 The top level Makefile can set up all other dependencies.
@@ -80,6 +83,7 @@ git checkout -b version-update-v$NEW_VERSION
 
 sed -i 's|grpc/go-juzu v[0-9.]*|grpc/go-juzu v'$NEW_VERSION'|g' grpc/go-juzu/juzupb/gw/go.mod
 sed -i 's|<Version>[0-9.]*</Version>|<Version>'$NEW_VERSION'</Version>|g' grpc/csharp-juzu/juzu.csproj
+sed -i 's|version='\''[0-9.]*'\''|version='\'$NEW_VERSION\''|g' grpc/py-juzu/setup.py
 sed -i 's|^VERSION="[0-9.]*"|VERSION="'$NEW_VERSION'"|g' grpc/Makefile
 
 git commit -m "Update version to v$NEW_VERSION"

diff --git a/docs-src/content/_index.md b/docs-src/content/_index.md
@@ -4,7 +4,7 @@ title: "Juzu SDK Documentation"
 
 # Juzu API Overview
 
-Juzu is Cobalt's speaker diarization engine. It can be deployed on-prem and accessed over the network or on your local machine via an API. We currently support C# and are adding support for more languages.
+Juzu is Cobalt's speaker diarization engine. It can be deployed on-prem and accessed over the network or on your local machine via an API. We currently support C# and Python, and are adding support for more languages.
 
 Once running, Juzu's API provides a method to which you can stream audio. This audio can either be from a microphone or a file. We recommend uncompressed WAV or lossless compression such as FLAC as the encoding, but we can support other formats as well upon request.
 
@@ -265,7 +265,7 @@ Cubic as well for transcription and aiding the diarization process.  This server
 exports Juzu's functionality over the gRPC protocol.  The
 https://github.com/cobaltspeech/sdk-juzu repository contains the SDK that you
 can use in your application to communicate with the Juzu server. This SDK is
-currently available for C# and we would be happy to talk to you if you need
-support for other languages. Most of the core SDK is generated automatically
-using the gRPC tools, and Cobalt provides a top level package for more
-convenient API calls.
+currently available for C# and Python, and we would be happy to talk to you if
+you need support for other languages. Most of the core SDK is generated
+automatically using the gRPC tools, and Cobalt provides a top level package for
+more convenient API calls.
diff --git a/docs-src/content/using-juzu-sdk/connecting.md b/docs-src/content/using-juzu-sdk/connecting.md
@@ -17,10 +17,26 @@ those to point to your server instance.
 
 The following code snippet connects to the server and queries its version.  It
 uses our recommended default setup, expecting the server to be listening on a
-TLS encrypted connection,  as the demo server does.
+TLS encrypted connection. Examples showing how to connect to a server not using
+TLS is also shown in the [Insecure Connection](#insecure-connection) section.
 
 {{%tabs %}}
 
+{{% tab "Python" %}}
+
+```py
+import juzu
+
+serverAddress = "127.0.0.1:2727"
+
+client = juzu.Client(serverAddress)
+
+resp = client.Version()
+print(resp)
+```
+
+{{% /tab %}}
+
 {{% tab "C#" %}}
 
 ``` csharp
@@ -51,6 +67,14 @@ can use:
 
 {{%tabs %}}
 
+{{% tab "Python" %}}
+
+```py
+client = juzu.Client(serverAddress, insecure=True)
+```
+
+{{% /tab %}}
+
 {{% tab "C#" %}}
 
 ``` csharp
@@ -80,8 +104,15 @@ authenticated TLS. This can be done with:
 
 {{%tabs %}}
 
-{{% tab "C#" %}}
+{{% tab "Python" %}}
 
+```py
+client = juzu.Client(serverAddress, clientCertificate=certPem, clientKey=keyPem)
+```
+
+{{% /tab %}}
+
+{{% tab "C#" %}}
 
 #### Authenticating Server Certificate
 

diff --git a/docs-src/content/using-juzu-sdk/installation.md b/docs-src/content/using-juzu-sdk/installation.md
@@ -8,6 +8,15 @@ Instructions for installing the SDK are language specific.
 
 <!--more-->
 
+### Python
+
+The Python SDK depends on Python >= 3.5. You may use pip to perform a system-wide install, or use virtualenv for a local install.
+
+```bash
+pip install --upgrade pip
+pip install "git+https://github.com/cobaltspeech/sdk-juzu#egg=cobalt-juzu&subdirectory=grpc/py-juzu"
+```
+
 ### C\#
 
 The C# SDK utilizes the [NuGet package manager](https://www.nuget.org).  The package is called `Juzu-SDK`, under the owners name of `CobaltSpeech`.

diff --git a/docs-src/content/using-juzu-sdk/streaming.md b/docs-src/content/using-juzu-sdk/streaming.md
@@ -24,6 +24,72 @@ transcription).
 
 {{%tabs %}}
 
+{{% tab "Python" %}}
+
+``` py
+import juzu
+
+serverAddress = '127.0.0.1:2727'
+
+# set insecure=True for connecting to server not using TLS
+client = juzu.Client(serverAddress, insecure=False)
+
+# get list of available models
+modelResp = client.ListModels()
+for model in modelResp.models:
+    print("ID = {}\t Name = {}\t [SampleRate = {} Hz]".format(model.id, model.name, model.attributes.sample_rate))
+
+# use the first available model
+juzuModelID = modelResp.models[0]
+
+# Using cubic model to transcribe; Cubicsvr must also be
+# running and the address:port provided in the Juzu server
+# config file. The cubic models and their ID on Cubicsvr can
+# found in cubicsvr.cfg.toml or be obtained via sdk-cubic.
+cubicModelID = "1"
+
+cfg = juzu.DiarizationConfig(
+    model_id = juzuModel.id,
+    cubic_model_id = cubicModelID,
+    num_speakers = 2,               # number of speakers expected in the audio file
+    audio_encoding = "WAV",         # supported : "RAW_LINEAR16", "FLAC", "WAV"
+    sample_rate = 16000,            # must match juzu model's expected sample rate
+)
+
+# client.StreamingDiarize takes any binary
+# stream object that has a read(nBytes) method.
+# The method should return nBytes from the stream.
+
+# open audio file stream
+audio = open('test.wav', 'rb')
+
+# helper function convert protobuf duration objects
+# (which stores the time split into in integer seconds
+# and integer nano seconds) into single floating value
+# in seconds
+def protoDurToSec(dur):
+    return float(dur.seconds) + float(dur.nanos) * 1e-9
+
+# defining function to print speaker segments and transcripts to screen
+def handleResults(diarizationResp):
+    for result in diarizationResp.results:
+        for segment in result.segments:
+            print("{start:.3f} - {end:.3f}\t{speaker}:\t{transcript}\n".format(
+                start = protoDurToSec(segment.start_time),
+                end = protoDurToSec(segment.end_time),
+                speaker = segment.speaker_label,
+                transcript = segment.transcript,
+                ))
+
+# sending streaming request to Juzu and
+# waiting for results to return
+for resp in client.StreamingDiarize(cfg, audio):
+    handleResults(resp)
+
+```
+
+{{% /tab %}}
+
 {{% tab "C#" %}}
 
 #### Program.cs
@@ -115,6 +181,123 @@ chosen correctly.
 
 {{%tabs %}}
 
+{{% tab "Python" %}}
+
+This example requires the [pyaudio](http://people.csail.mit.edu/hubert/pyaudio/)
+module to stream audio from a microphone. Instructions for installing pyaudio
+for different systems are available at the link. On most platforms, this is
+simply `pip install pyaudio`
+
+``` py
+import juzu
+import pyaudio
+import threading
+
+serverAddress = '127.0.0.1:2727'
+
+# set insecure=True for connecting to server not using TLS
+client = juzu.Client(serverAddress, insecure=True)
+
+# get list of available models
+modelResp = client.ListModels()
+
+# use the first available model
+juzuModel = modelResp.models[0]
+
+# creating diarization config to transcribe + diarize
+# audio stream from microphone
+cfg = juzu.DiarizationConfig(
+    model_id = juzuModel.id,
+    cubic_model_id = "1",
+    num_speakers = 2,
+    audio_encoding = "RAW_LINEAR16",
+    sample_rate = juzuModel.attributes.sample_rate,
+)
+
+# client.StreamingDiarize takes any binary stream object that has a read(nBytes)
+# method. The method should return nBytes from the stream. So pyaudio is a suitable
+# library to use here for streaming audio from the microphone. Other libraries or
+# modules may also be used as long as they have the read method or have been wrapped
+# to do so.
+
+# defining class to wrap around microphone stream from py audio
+class MicStream(object):
+
+    def __init__(self, sampleRate):
+
+        self._p = pyaudio.PyAudio()
+        # opening mic stream, recording 16 bit little endian integer samples, mono channel
+        self._stream = self._p.open(format=pyaudio.paInt16, channels=1, rate=sampleRate, input=True)
+        self._stopped = False
+
+    def __del__(self):
+        self._stream.close()
+        self._p.terminate()
+
+    # streamingDiarize requires a read(nBytes) method
+    # that return a list of nBytes from the stream. An
+    # empty list signals the end of stream.
+    def read(self, nBytes):
+        # if stream is stopped, return empty list to
+        # signal end of stream to Juzu
+        if self._stopped:
+            return []
+        return self._stream.read(nBytes)
+
+    def pause(self):
+        self._stream.stop_stream()
+
+    def resume(self):
+        self._stream.start_stream()
+
+    def stop(self):
+        self._stopped = True
+
+
+audio = MicStream(juzuModel.attributes.sample_rate)
+
+# helper function convert protobuf duration objects
+# (which stores the time split into in integer seconds
+# and integer nano seconds) into single floating value
+# in seconds
+def protoDurToSec(dur):
+    return float(dur.seconds) + float(dur.nanos) * 1e-9
+
+# starting thread to send streaming request to juzu
+# and process results once they come back after the
+# stream ends.
+def streamToJuzu(cfg, audio):
+    try:
+        for resp in client.StreamingDiarize(cfg, audio):
+            for result in resp.results:
+                for segment in result.segments:
+                    print("{start:.3f} - {end:.3f}\t{speaker}:\t{transcript}\n".format(
+                        start = protoDurToSec(segment.start_time),
+                        end = protoDurToSec(segment.end_time),
+                        speaker = segment.speaker_label,
+                        transcript = segment.transcript,
+                        ))
+    except Exception as ex:
+        print("[error]: streaming diarization failed: {}".format(ex))
+
+streamThread = threading.Thread(target=streamToJuzu, args=(cfg,audio))
+streamThread.setDaemon(True)
+streamThread.start()
+
+# waiting for user to end mic stream
+print("\nStreaming audio to Juzu server ...\n")
+k = input("-- Press Enter key to stop stream --")
+
+print("\nStopping Stream ...")
+audio.stop()
+
+print("Waiting for results ...")
+streamThread.join()
+
+```
+
+{{% /tab %}}
+
 {{% tab "C#" %}}
 
 We do not currently have example C# code for streaming from a microphone. Simply

diff --git a/docs/index.html b/docs/index.html
@@ -160,7 +160,7 @@
 
 <h1 id="juzu-api-overview">Juzu API Overview</h1>
 
-<p>Juzu is Cobalt&rsquo;s speaker diarization engine. It can be deployed on-prem and accessed over the network or on your local machine via an API. We currently support C# and are adding support for more languages.</p>
+<p>Juzu is Cobalt&rsquo;s speaker diarization engine. It can be deployed on-prem and accessed over the network or on your local machine via an API. We currently support C# and Python, and are adding support for more languages.</p>
 
 <p>Once running, Juzu&rsquo;s API provides a method to which you can stream audio. This audio can either be from a microphone or a file. We recommend uncompressed WAV or lossless compression such as FLAC as the encoding, but we can support other formats as well upon request.</p>
 
@@ -417,10 +417,10 @@ <h2 id="obtaining-juzu">Obtaining Juzu</h2>
 exports Juzu&rsquo;s functionality over the gRPC protocol.  The
 <a href="https://github.com/cobaltspeech/sdk-juzu">https://github.com/cobaltspeech/sdk-juzu</a> repository contains the SDK that you
 can use in your application to communicate with the Juzu server. This SDK is
-currently available for C# and we would be happy to talk to you if you need
-support for other languages. Most of the core SDK is generated automatically
-using the gRPC tools, and Cobalt provides a top level package for more
-convenient API calls.</p>
+currently available for C# and Python, and we would be happy to talk to you if
+you need support for other languages. Most of the core SDK is generated
+automatically using the gRPC tools, and Cobalt provides a top level package for
+more convenient API calls.</p>