TS 資訊科技與人才培育: serverless

顯示具有 serverless 標籤的文章。顯示所有文章

7/26/2017

聊天機器人 - 快速製作在LINE上的人臉辨識應用

名人以及圖片分析在和LINE聊天機器人之對話中

聊天機器人(chatbot)作為人機介面，提供人類各種整合性服務是最容易產生的應用。而人臉辨識，一直都是人工智慧與數據分析的整合課題。因此，把LINE聊天機器人加上照片或人臉辨識的功能，似乎也很有趣。

用LINE QR 加小姍為好友可以測試人臉辨識

以前，在做關於影像的實驗性質的程式時，通常會先考慮opencv。雖然opencv確實是個好工具，但是如果你的目標不是改善演算法，或甚至做出更先進的人臉辨識方式，那麼opencv會過於複雜。

在2016年底，AWS發表另一個雲端服務：Rekognition。這個服務提供了API用以辨識影像，並順便提供了幾個在應用上的api：「比較人臉」「辨別名人」「識別限制級圖案」。(文件請參考這裡)

這些api要運用的最簡單方式之一，就是使用AWS Lambda來驅動AWS內自己的API，再透過API Gateway跟外界 - 也就是chatbot整合。換言之，這仍然符合公有雲廠商(無論是AWS, google還是azure)的所謂serverless的未來方向。雖然這些公有雲廠商，其實只是為了讓客戶更難離開公有雲環境，但不可否認的是，這些api的確有用而且在初期成本也不高。

快速製作在LINE上的人臉辨識，需要幾個步驟：

(1) 對serverless的設計概念有些瞭解

請參考這裡及這裡。

(2) 對Line聊天機器人申請和製作，以及對AWS Lambda先有基本的瞭解。

可參考這裡和這裡。

(3) 在LINE webhook的event中處理image id。

在webhook的lambda程式中，特別挑出image的id。LINE的訊息傳遞給chatbot時，有分不同的type，要處理的是image type。LINE並不會真的傳圖片檔案到webhook中，他傳遞的是圖片id，透過這個id，可以用一個URL拿到圖片：

https://api.line.me/v2/bot/message/<id>/content

要取得這個圖片，當然要有Line token

(4) 讀取圖片URL並且以取得bytes

以python為例，首先以requests讀取URL，記得stream必須設為True，因為接下來需要將資料(影像的byte)直接讀取成bytearray。參考程式如下

imageUrl = 'https://api.line.me/v2/bot/message/{}/content'.format(imageId)

r = requests.get(imageUrl, headers=headers, stream=True)

bArray = None

with r.raw as data:

f = data.read()

bArray = bytearray(f)

(5) 使用各種AWS的Rekognition服務。

取得bytearray之後，剩下的事情就很簡單了。

以python為例，可以使用boto3 (最好是1.4.4版本)。先取得rekognition的client物件，直接使用裡面的方法(例如以下範例)。將Image參數都設定成{ 'Bytes': your_byte_array} 就可以取得分析的結果。

rclient = boto3.client('rekognition')

response = rclient.recognize_celebrities(

Image = { 'Bytes':bArray }

)

要注意的是，分析結果response是一個含有各種標籤與技術數值(例如信心程度)的dictionary物件，所有的標籤都還是英文，必須得自己轉換成中文才行。

範例中的「名人辨識」(celebrities)所查到的名字都是英文。可以利用wiki 英文api搜尋這個英文字，找到對應的中文網頁，在取得中文字。

wiki的英文api可參考這裡。

(6) 存取S3之考量

如果看過AWS document應該會發現，使用recognize都可以設定image來源是S3。那麼範例為何不存取S3？

事實上，的確可以將LINE的影像，先存在S3，然後再進行分析。然而，這樣會多了「存入」S3和取出S3的時間。並且，S3也是要收費的！影像如果只「分析一次」，那麼存在S3其實很不划算，存在Rekognition裡面更是貴。如果會反覆利用，那麼恐怕還是得存在S3中。

目前結果分享

用LINE將小姍加入好友，就可以試用一下目前LINE與AWS人臉辨識整合。

加小姍為好友 ID-> @opn2514f

加小姍為好友

下圖是辨識川普不同的表情，會被辨識出不同的年紀，和不同的心情。

7/12/2017

Serverless design for IoT - An example leverage AWS and GrovePi

AWS announced IOT service in about 2015 and gradually release other relative service (for example: IoT Button) for those who need to tackle with the problem on huge amount of increasing response of "The Things". And it is of course the area which cloud provider what to provide a optional solution.

To demonstrate the benefits of leveraging the serverless design and also utilize the power of AWS cloud. I build an example project combines Serverless design, AWS IoT, Respberry Pi, Grove Sensor system and GrovePI. It will provide in door air quality (office) for me if I want to know that before entering office. So that I can have an excuse to work from somewhere else? :)

In this example, a GrovePi mounts in Raspberry Pi (B+) to control Grove's Air Quality Sensor, HCHO sensor and dust sensor. As a software engineer, I assembled all these inside a paper box. See picture below.

RPi and GrovePi are inside the box. 3 sensors are out there.

Reminder: to use AWS service, the most important things is to read official document. AWS has many different services and there are too many out-of-date articles in somebody's blog. It doesn't mean that authors were wrong, it is just out-of-date. Of course, it is the same in Raspberry Pi and all other 3rd party open source library, try to read official document (or official wiki/blog) to have overall view.

The full implementation and design concerns

(please check all the project detail in github)

(1) Grove's 3 sensors + GrovePi + Raspberry Pi

The hardware parts. Check GrovePi's official web site to know how to put them together.

GrovePi might be the easiest way to program Grove's system from Raspberry Pi, if you have more then 2 device in a machine. However, if you have only one sensor, then just use RPi's GPIO.

(2) AWS IoT service

Although we didn't program anything in side the hardware, we still need to setup things in AWS IoT service. And of course, it will be better to read at least the tutorial.

Screenshot of AWS IoT Tutorial

AWS IoT pricing model is counting by message (512bytes). At this moment, about $5 per million message. Which means about $5 per 500MB! This is much more expensive than own a EC2 service to serve device message. However, if you don't need to keep all monitoring data transit in AWS every few seconds and you need only monitoring state changes (maybe a few times per day) then a "Device Shadows" is the best for you.

In this example, we register a "Thing" named: InDoorSensor1 and the most important thing is to have default Shadow Object as below:

{
  "desired": {
    "welcome": "aws-iot",
    "air quality": 43,
    "action": "wait"
  },
  "reported": {
    "welcome": "aws-iot",
    "action": "wait",
    "air quality": 43
  }
}

The device will keep sync the Shadow in AWS and if the desired state change to "do", it will (a) do a one time air sensor data collection and then (b) update air quality in Shadow object (c) change to "wait" state. In sort, the Shadow and Device will sync the state (wait or do) and the state's sync is the major function provide from AWS.

(3) AWS IoT Python SDK + GrovePi Python library

AWS provides a few SDK for device, in this project, we use Python to do AWS Cloud access (no matter notification or change Shadow state)

In the Raspberry PI B+, you need to:

(a) install AWSIoTPythonSDK
# pip install AWSIoTPythonSDK (also see here)

(b) consider the protocol (MQTT vs Websocket). In some environment, the MQTT port might be block. AWS SDK provides MQTT via WebSocket which of course allow broker use port 443.

(c) certificates: please do read AWS IoT Certificate document if you didn't have experience before

If you use Raspberry PI version B+ 2 or 3, then it will be easy to install nodejs/npm and all other fancy stuff.

(4) IoT Shadow

The Shadow means an identify object of IoT device. This allows client to change the state of a object and then sync to IoT device. In certain scenario, it allows programmer no need to take care of network error handling or any off line case. However, you still need to fully understand what means exactly the "desired" and "reported" state.
It is possible to edit Shadow state from AWS admin console direct for testing purpose. (you won't want to do so if you have thousands IoT device).

(5) Lambda, API Gateway

Supposedly, an application will NOT access specific IoT device, it normally access a service and that service provides information or allow meaningful user actions.
In this case, a lambda service is simple a python program which can (1) retrieve current state and also current air quality value (2) update state to "do". And as always, the Lambda is behind an API Gateway and which means, potentially, all other application could use this API to access necessary (filtered) information.

see the activity hand draw:

Next Project

Hopefully, I can have more budget to purchase Raspberry PI 3 and also CO2 sensor and then also gather data to draw graphic in D3. Also, I am thinking to use LINE to send air quality information to my colleague or neighborhood.

6/20/2017

Serverless design for LINE AI Chatbot

Chatbot is one of the interesting application in AI area, it creates opportunities for enterprise to serve customers only with very low cost or even generate new revenue.
In past few years, major Instant Messaging providers allow developers to hook their service. Means as long as you have existing simple message process and response system, you can quickly interact with all kind of message channel.

Normally, a software developer will start from build a system in a server box, no matter Linux or Windows. Recently, the server might be a VM in public cloud, no matter AWS, Azure, Linode or DigitalOcean. However, a serverless design model might be a better choice.

Why Serverless?

Firstly, a serverless system will be easy to scale in/out. It doesn't mean you can't scale in/out in traditional VM in public cloud or your own datacenter. It just means that all the Lambda, no matter which provider, is actually decouple from it development environment. Supposedly, you start from one Lambda function to a few thousands same Lambda function without consider "traditional question", for example: should I shutdown VM when not in peak our, should I do some script to check if current VMs are closed to overloading?

Secondly, a serverless system will be easy to plug-in which means during the design phase, developer will be forced to think de-couple functions in small modules (bricks). Developer will also be forced NOT to rely on specific environment, even though docker is one of the solution but purely Lambda function will create much better environment-free structure.

Furthermore, it will also help to define boundary of sub system and help the future maintenance.

The Design Concerns

(1) IM independent

LINE occupies a huge market in Taiwan, about more than 90% of mobile user has LINE account. The most incredible thing is many elder people who never touch Internet before have LINE accounts! However, this design won't use any LINE specific methods. We've try the same engine in Yahoo Messenger and it also works.

(2) AWS Lambda

-- (2.1) try NOT to use context

AWS Lambda has a standard invoke parameter (event, context), The event is actually the user input when invoke Lambda function. The context is what developer might need to understand the 'environment context'. The major design concern here is try NOT to use context when possible. Because this will make you hard to move out your lambda to other public cloud environment. If you really need to have ARN or identity, try to limit environment in just one Lambda.

-- (2.2) async invoke

AWS Lambda could be invoked in 3 types: Event, RequestResponse, DryRun. The "Event" is actually asynchronous call. For any IM message receiver Lambda, you should keep that Lambda as simple as possible to response IM webhook. Put other things via "Event" Lambda. Because most of IM provider (LINE, fb) ask a very short timeout in IM webhook. DO NOT just put http webhook and response to IM a synchronous call stack

Of course, see detail from AWS document: here.

-- (2.3) timeout/memory

AWS lambda allow to config timeout and memory size. AWS CloudWatch could see a Lambda's resource consuming. It is fine to use larger memory or setup a longer running time but developer should know WHY.

-- (2.4) quick testing

It is necessary to have your own developer server for test your Lambda function and trigger a deployment script to upload to AWS. If you didn't actually use "context", it will be very simple to have a quick test in every Lambda handler.

# in the end of your Lambda python script.
if __name__ == '__main__':

    event = {'param1':test'}
    lambda_handler(event,None)

Of course developers need other framework (unittest).

-- (2.5) deployment

As always, from a developer should have a semi-automatic way to do deployment. This is a very simple deployment script to (a) zip python files (b) upload to S3 (c) create lambda function (d) config function using S3 zip file.

(a) zip lottery.zip -r lambda_lottery.py lottery60.py
(b) aws --profile ailine s3 cp lottery.zip s3://bucket/
(c) aws --profile ailine lambda create-function --function-name lottery --runtime py
thon3.6 --role "arn:aws:" --handler lambda_lott
ery.lambda_handler --timeout 10 --code "S3Bucket=bucket,S3Key=lottery.zip"
(d) aws --profile ailine lambda update-function-code --function-name lottery --s3-bu
cket bucket --s3-key lottery.zip

-- (2.6) scheduled (cron) Lambda

Chatbot might need to do scheduled task to response to user, maybe send a regular morning call. To trigger a scheduled Lambda might be one of the major cloud-provider-dependent thing we have in Chatbot design.

(3) AWS API Gateway

AWS API Gateway is another major cloud-provider-dependent things, however, it is not hard to use other provider or have our own lab testing environment. The major concerns of API Gateway are (a) should convert IM provider's http request to a given format: which becomes a Lambda input. (b) security concerns: how to make sure only IM provider's system could access this API Gateway

(4) AWS dynamodb

Chatbot uses dynamodb to store use information and also message log. It is also pretty easy to use local JSON formate nosql.

(5) AWS elasticsearch

Chatbot leverages AWS elasticsearch to store knowledge base. It is easy to setup a developer's elasticsearch server to do lab test before deployment. The real concerns in public cloud might be the future budget:)

(6) AWS S3

Chatbot still need some static content (html or js) and S3 is the most easy way to provide public static content. It is also the place to upload latest Lambda code.

The Implementation

See: github repository

Take a look?

This chatbot could understand and speak only Tradition Chinese, since she is a Taiwanese robot:). You need to have LINE account to chat with her.

聊天機器人小姍的Line QR

加小姍為好友

訂閱：文章 (Atom)