Experiements – Devthon https://13.233.195.217 Decoding Innovation Sat, 06 Mar 2021 19:48:12 +0000 en-GB hourly 1 https://wordpress.org/?v=6.6.4 https://13.233.195.217/wp-content/uploads/2020/11/cropped-Devthon-Logo-Color400x400-32x32.png Experiements – Devthon https://13.233.195.217 32 32 Pose Estimation Benchmarks on intelligent edge https://13.233.195.217/pose-estimation-benchmarks-on-intelligent-edge/ https://13.233.195.217/pose-estimation-benchmarks-on-intelligent-edge/#respond Wed, 21 Aug 2019 09:04:27 +0000 https://18.224.111.186/?p=2846 Image for post
Photo by Emile Guillemot on Unsplash

Benchmarks on Google Coral, Movidius Neural Compute Stick, Raspberry Pi and others

Introduction

In an earlier article, we covered running PoseNet on Movidius. We saw that we were able to achieve 30FPS with acceptable accuracy. In this article we are going to evaluate PoseNet on the following mix of hardware:

  1. Raspberry Pi 3B
  2. Movidius NCS + RPi 3B
  3. Ryzen 3
  4. GTX1030 + Ryzen 3
  5. Movidius NCS + Ryzen 3
  6. Google Coral + RPi 3B
  7. Google Coral + Ryzen 3
  8. GTX1080 + i7 7th Gen

This is a comparison of PoseNet’s performance across hardware, to help decide which hardware to use for a specific use case, if optimizations can help. It also gives a glimpse into hardware capabilities in the wild. The hardware included a range from baseline prototyping platforms to tailored for edge to production-grade CPUs.

Hardware Choices

  1. Raspberry Pi: The board of choice for prototyping, although low powered, gives a good initial understanding of what to expect and what to choose for production. It may not be able to run the DNN models, but it sure is fun.
  2. Movidius NCS + RPi 3B: Movidius Neural Compute Stick is a promising candidate if the model is to be run on the edge. NCS has Vision Processing Units (VPU) which are optimized to run deep neural networks.
  3. Ryzen 3: AMD’s quad-core CPUs are not a conventional choice for neural networks, but it is worth checking how the networks perform on the platform.
  4. GTX1030 + Ryzen 3: Adding an Nvidia GPU to the rig (granted, it is comparatively old but it is cheap) allows us to benchmark what is possible on older cuDNN versions and GPUs.
  5. Movidius NCS + Ryzen 3: A desktop system allows for better and faster interfacing with the NCS. This setup is preferred during prototyping your edge application. Having a high performance CPU allows rapid application development while NCS gives the ability to run your models on your development laptop.
  6. Google Coral + RPi 3B: Google’s answer to on-edge ML is their Coral board which has TPUs. Tensor Processing Units are used by Google’s gigantic AI systems. Coral puts the compute power of TPUs on small form factor. It has native support for Raspberry Pi too.
  7. Google Coral + Ryzen 3: As we mentioned in Movidius NCS + Ryzen 3 section, it is going to be insightful to see how Coral interfaces with Ryzen 3 based computer.
  8. GTX1080 + i7 7th Gen: Top of the line system with GTX1080 and Intel i7 CPU. This is the highest performing combination in the list.

Repositories and models used:

  1. PoseNet — tfjs version
  • Based on MobileNetV1_050
  • Based on MobileNetV1_075
  • Based on MobileNetV1_100

2. PoseNet — Google Coral version

3. Read our previous blog post to get Movidius versions of PoseNet

Comparing Edge Compute Units

Google Coral’s PoseNet repository provides a model based on MobileNet 0.75 which is optimized specifically for Coral. At the time of writing, the details of the optimizations have not been provided and it is not possible to generate models for MobileNet 0.50 and 1.00.

Image for post
Google Coral vs Intel Movidius

The optimized Coral model gives an exceptional performance of 77FPS with Ryzen 3 system. However, the same model gives ~9FPS when running on Raspberry Pi.

Movidius shows differences in performance with RPi and Ryzen, with the general pattern being faster on the Ryzen 3 system

Comparing Desktop CPUs and GPUs

The results are aligning with expectations while comparing CPU with GTX 1030 and GTX 1080. The high-end GPU outperforms the other candidates by a huge margin. However, the competition between Ryzen 3 and GTX 1030 is close.

Image for post
Ryzen vs GTX 1030 vs GTX 1080

Final Thoughts

The following chart shows frames per second for a standard video input:

Image for post
Frames per second

Google Coral, when paired with a desktop computer outperforms every other platform — including GTX1080.

Other noteworthy results are:

  1. When paired with Raspberry Pi 3, Coral gives ~9FPS. The reason behind the result is not yet explained but is being looked into.
  2. GTX1080 performs almost equally regardless of the model size.
  3. Movidius NCS performs better than GTX1030.
  4. Raspberry Pi is not able to run the models at all.

Different hardware gives a different flavor of performance, and there is scope for model optimization (quantization for example). It may not always be necessary to go with a high-end GPU such as GTX 1080 if your use case allows for a good trade-off between accuracy and speed/latency.

Our analysis shows that choosing the right hardware coupling with a well-optimized neural network is essential and may require in-depth comparative analysis.

]]>
https://13.233.195.217/pose-estimation-benchmarks-on-intelligent-edge/feed/ 0
Real Time Human Pose Estimation on the edge with Movidius NCS and OpenVINO https://13.233.195.217/real-time-human-pose-estimation-on-the-edge-with-movidius-ncs-and-openvino/ https://13.233.195.217/real-time-human-pose-estimation-on-the-edge-with-movidius-ncs-and-openvino/#respond Thu, 08 Aug 2019 09:00:19 +0000 https://18.224.111.186/?p=2843 Image for post
Movidius+Raspberrypi

An approach towards low cost computing on the edge for vision based AI applications

Introduction

Pose estimation is a computer vision approach to detect various important parts of a human body in an image or video. It gives pixel locations of where eyes, elbows, arms, legs, etc are for one or more human bodies in an image. The algorithm gives locations of “joints” of a body. However pose is a broader subject where-from we are only focusing on human body pose estimation. None of the algorithms are perfect and are heavily dependent on the training data.

How is it useful?

Human pose detection on the edge can be used to read body language and body movement in real-time at the same location as the person/s. This enables numerous applications in Security, Retail, Healthcare, Geriatric care, Fitness, Sports domains. Coupled with Augmented/Mixed Reality, we can transpose a human into a virtual world thus opening up newer opportunities and experiences in Fashion retail, Entertainment, Advertising and Gaming. Along with gesture recognition you can interact with the virtual world.

What is Myriad NCS?

If you have not heard of Intel’s Neural Compute Stick, it is a small device that plugs in via USB port and runs deep neural networks. Think of it as a USB graphics card that is optimised to run certain deep learning frameworks and models. Being a USB device, it can be run on an edge computing device such as a Raspberry Pi. It is low powered and comparatively small. These points make it a very good choice to run machine learning models on edge. If you are looking for something more embedded you can look at the VPUs from Intel.

OpenVINO provided OpenPose Model

OpenVINO provides a set of pre-trained models which can be run on Movidius NCS without having to go through the conversion process. One of the pre-trained models is human-pose-estimation. It is a multi-person model, based on MobileNet V1 and trained using caffe framework.

This model is a larger architecture based on OpenPose. The complexity is 15GFlops with 42.8% average precision on COCO dataset. The high complexity of the model is a bottleneck, rendering the option unusable on edge for real time detection. During our benchmarks, the model gave 2FPS on Movidius NCS 1. However, the accuracy was higher than PoseNet.

Tensorflow JS Posenet Model

Google has released a freely available, pre-trained model for pose estimation in browser, it is called PoseNet. You can refer to this blog post to know more about the model and its architecture.

In brief, the model is based on MobileNet V1 and is trained to detect single-person or multi-person poses. The model is optimised to run on Tensorflow JS which means it is light enough to run in a web browser.

Here is an overview of what we are going to do:

  1. Convert Tensorflow JS model to a normal Tensorflow model
  2. Install OpenVINO
  3. Convert Tensorflow model to OpenVINO supported format
  4. Run the model on Movidius NCS

Convert tfjs to Tensorflow

You can take one of the following 3 ways to get a .pb file:

  1. Download the files generated by us: click here to download
  2. Convert it yourself using tfjs-converter
  3. Use this repo, which downloads and converts the tfjs models for you

The simplest way is to download the ones we have given. That way you don’t have to install extra stuff on your computer and worry about the process of conversion.

As you will notice, there are 3 important files:

  1. model-mobilenet_v1_050.pb
  2. model-mobilenet_v1_075.pb
  3. model-mobilenet_v1_100.pb

These files refer to different version of MobileNet on which the pose estimator has been trained. To simplify, 050 is the fastest with low accuracy, 075 has more accuracy but is slower than 050. Lastly, 100 is the slowest but the most accurate among the three.

Which one should you choose? Keep reading, we are going to evaluate which model gives the best trade-off of accuracy and speed soon!

Install OpenVINO

To be able to run the model on Movidius NCS, we are going to use Intel’s distribution of OpenVINO toolkit. OpenVINO can be installed on Linux, Windows & Raspbian OS. You can follow the official instructions to install the toolkit. We have installed the toolkit on Ubuntu 16.04 to convert the model, and used Raspbian to run the model.

Step 1:

Install OpenVINO toolkit on your Linux machine. Keep in mind that you won’t be able to convert a tensorflow model to OpenVINO supported format on a Raspberry Pi, so this installation is a must (or install it on Windows).

Step 2:

Install OpenVINO toolkit on Raspbian. Raspbian installation of the toolkit only has inference engine. Which means you cannot convert your tensorflow (or caffe, MXNet) models to Intermediate Representation supported by OpenVINO, you will only be able to run inference on already converted models.

Next, we are going to:

  1. Convert tensorflow model to Intermediate Representation on a Linux machine
  2. Run inference on Raspberry Pi

Convert Tensorflow Model to OpenVINO Intermediate Representation

Intermediate Representation (IR) of a model is a file format recognised by OpenVINO toolkit, which is optimised to run on edge computing devices such as Movidius NCS.

Run the following command in your terminal:https://medium.com/media/02650741c065001542f2b78f7be5bcfe

This will give you two files: model-mobilenet_v1_075.mapping and model-mobilenet_v1_075.xml. These files are necessary to run inference on Movidius NCS.

You can replace — input_model with other versions of PoseNet (050 and 100) to get Intermediate Representations.

Transfer the two files on your Raspberry Pi and continue to the next step!

Running Inference on Raspberry Pi

Assuming you have installed OpenVINO toolkit on your Raspberry Pi and have transferred .mapping and .xml files, it is time to clone the repository .

The repository contains code to run benchmarks on Movidius. The code does not perform any image post processing to get proper benchmarks and to keep things simple. You can write OpenCV layer to render the key points on top of your input image.

Make sure your Movidius NCS is attached to the Raspberry Pi. Download an image of a person from the Internet and save it. Let’s call the downloaded image’s location $IMAGE_PATH. Next, move your model-mobilenet_v1_075.xml and model-mobilenet_v1_075.mapping files to the repository’s root.

Execute the following command in your terminal to run inference on Raspberry Pi:https://medium.com/media/4a6c0470c9df1ec1bc8a80b649b6f573

Results

Image for post
FPS comparison of different mobilenet models

The smallest model performs the fastest, with 42 frames per second! Check out the videos to understand how accurate each of them are:

Image for post
Posenet50 at 30 FPS
Image for post
Posenet75 at 30FPS
Image for post
Posenet100 at 12FPS

We recommend you use 075 version, because 30 FPS is smooth enough for human eyes to consider it real time, and the accuracy is acceptable too for many use cases. However, you might want to consider another version depending upon your use case.

References:

  1. Real-time Human Pose Estimation in the Browser with TensorFlow.js
  2. OpenVINO Documentation
  3. Download converted Tensorflow JS Models
  4. GitHub Repository to run inference on RPi
  5. posenet-python GitHub repository
  6. Tfjs-converter
  7. Tensorflow Pose Estimation
  8. Wikipedia — Pose
  9. OpenVINO pre-trained models
]]>
https://13.233.195.217/real-time-human-pose-estimation-on-the-edge-with-movidius-ncs-and-openvino/feed/ 0
Car or Not a Car https://13.233.195.217/car-or-not-a-car/ https://13.233.195.217/car-or-not-a-car/#respond Wed, 03 Jul 2019 08:55:44 +0000 https://18.224.111.186/?p=2840 Lessons from Fine Tuning a Convolutional Binary Classifier
Image for post
Taken in a village Near Jaipur (RajasthanIndia) by Sanjay Kattimani http://sanjay-explores.blogspot.com

Fine tuning has been shown to be very effective in certain types of neural net based tasks such as image classification. Depending upon the dataset used to train the original model, the fine-tuned model can achieve a higher degree of accuracy with comparatively less data. Therefore, we have chosen to fine tune ResNet50 pre-trained on the ImageNet dataset provided by Google.

We are going to explore ways to train a neural network to detect cars, and optimise the model to achieve high accuracy. In technical terms, we are going to train a binary classifier which performs well under real-world conditions.

There are two possible approaches to train such a network:

  1. Train from scratch
  2. Fine-tune an existing network

To train from scratch, we need a lot of data — millions of positive and negative examples. The process doesn’t end at data acquisition. One has to spend a lot of time cleaning the data and making sure it contains enough examples of real world situations that the model is going to encounter practically. The feasibility of the task is directly determined by the background knowledge and time required to implement that.

Basic Setup

There are certain requisites that are going to be used throughout the exploration:

  1. Datasets
    a. Standford Cars for car images
    b. Caltech256 for non-car images
  2. Base Network
    ResNet — arXiv — fine-tuned on ImageNet
  3. Framework and APIs
    a. TensorFlow
    b. TF Keras API
  4. Hardware
    a. Intel i7 6th gen
    b. Nvidia GTX1080 with 8GB VRAM
    c. System RAM 16GB DDR4

Experiment 1

To start with a simple approach, we take ResNet50 without the top layer and add a fully connected (dense) layer on top of it. The dense layer contains 32 neurons which are activated with sigmoid activator. This gives approximately 65,000 trainable parameters which are plenty for the task at hand.

Image for post
Model Architecture for experiment 1

We then add the final output layer having a single neuron with sigmoid activation. This layer has a single neuron because we are performing binary classification. The neuron will output real values ranging from 0 to 1.

Data Preparation

We are randomly sampling 50% of images as the training dataset, 30% as validation and 20% as test sets. Although there is a huge gap between the number of car and non-car images in the training set, it should not skew our process too much because the datasets are comparatively clean and reliable.

Image for post

Hyper-parameters

Image for post

Results

As a trial run, we trained for one epoch. The graphs below illustrate that the model starts at high accuracy, and reaches near-perfect performance within the first epoch. The loss goes down as well.

Image for post
Epoch Accuracy for Experiment 1
Image for post
Epoch Loss for Experiment 1

However, validation accuracy does not seem very good compared to the training round, and neither does validation loss.

Image for post
Validation Accuracy for Experiment 1
Image for post
Validation Loss for Experiment 1

So, we ran for 4 epochs and were left with the following results:

Image for post
Accuracy and Loss for four epochs
Image for post
Validation accuracy and validation loss for four epochs
Image for post

The model performs relatively well, except for the high degree of separation between training and validation losses.

Experiment 2

We decided to keep the model architecture the same as the one we used in the first experiment, using the same ResNet50 without the top layer and adding a fully connected (dense) layer on top of it containing 32 neurons activated with sigmoid activator.

Image for post
Model Architecture for experiment 2

Data Preparation

This is where the problem lay in the previous experiment. The train/validation/test data splits were random. The hypothesis was that the randomness has added more images of some cars, and too little of others, causing the model to be biased.

So, we took the splits as given by the Cars dataset and added 3000 more images by scraping the good old Web.

Image for post

Hyper-parameters

Image for post

Results

These results signify substantial improvement in the validation accuracy when compared to the previous experiment.

Image for post
Epoch Accuracy for experiment 2
Image for post
Epoch Loss for experiment 2

Even though the accuracy matches fairly well, there is a big difference between the training loss and the validation loss.

Image for post
Validation Accuracy for experiment 2
Image for post
Validation Loss for experiment 2
Image for post

This network seems more stable than the previous one. The only observable difference is that of new data splits.

Experiment 3

Here we add an extra dropout layer which provides a 30% chance that a neuron will be dropped out of the training pass. The dropout layer has been known to normalize models, to prevent possible biases caused by interdependence of neurons.

Image for post
Model Architecture for experiment 3

Since we have a comparatively huge pre-trained network and smaller trainable network, we could add more dense layers to see the effects. We did that and the model ended up achieving saturation in fewer epochs. No other improvements were observed.

Data Preparation

Image for post

Just like in experiment 2, the default train/validation splits are taken.

Hyper-parameters

Image for post

Here, we have run the model on a single learning rate but the value can be experimented with. We will talk about the effects of batch size on this network in the results section.

Results

The results here are with the batch size of 32. As seen, in 3 epochs the network seems to saturate (although it might be a bit premature to judge this).

Image for post
Epoch accuracy for experiment 3
Image for post
Epoch Loss for experiment 3

At the same time validation accuracy and loss also seem to be performing well.

Image for post
Validation Accuracy for experiment 3
Image for post
Validation Loss for experiment 3

So, we increase the batch size to 128 hoping it would help the network find a better local minima and thereby giving a better overall performance. Here is what happened:

Image for post
Epoch Accuracy and Loss for batch size of 128
Image for post
Validation Accuracy and Loss for batch size of 128

The model now performs reasonably well on both training and validation sets. The losses between training and validation runs are not too far apart either.

Image for post

Model Drawbacks

Obviously, the model is not one hundred percent accurate. It does provide certain failed classifications as a result.

Conclusion

When we ran this model on the testing dataset, it failed on only 7 images out of car + non-car sets. This is a very high degree of performance accuracy and closer to production usage.

In conclusion, we can safely assert that dataset splits are crucial. Rigorous evaluations and experimentation with various hyper-parameters give us a better idea of the network. We should also think about modifying the original architecture based on the evidence provided by the various hyper-parameters.

]]>
https://13.233.195.217/car-or-not-a-car/feed/ 0
How to port a custom Tensorflow Model to Movidius NCS? https://13.233.195.217/how-to-port-a-custom-tensorflow-model-to-movidius-ncs/ https://13.233.195.217/how-to-port-a-custom-tensorflow-model-to-movidius-ncs/#respond Mon, 13 May 2019 08:38:57 +0000 https://18.224.111.186/?p=2828 Image for post
Photo by David Clode on Unsplash

Movidius Neural Compute Stick

Intel makes a device which can be plugged into a Raspberry Pi3 (among other supported boards) to run neural networks with efficiency. The device is called Neural Compute Stick (NCS) and attaches over USB port. Intel provides with a toolchain which can be used to port Tensorflow and Caffe models to NCS supported format.

Choosing a network

We wanted to port our own custom trained model to NCS. The journey has been, well, insightful and interesting. The task we had at hand was of object detection. We wanted to use Tensorflow because of the familiarity we had with the framework. We decided to use InceptionV3 as our base network and retrain it.

Training the network

As it goes, there are various ways to retrain a model to bias it towards the objects you want to detect. We used retrain.py script provided by Tensorflow, which uses Tensorflow Hub to ease the tasks.

The script takes InceptionV3, adds a few layers such as Placeholder and softmax along with some variables. It then retrains the model. You can read the comments in the script to know the details of the training phases (it involves creating bottleneck files, has options to distort images etc).

Porting InceptionV3

To port this retrained model, we need to use NCSDK which provides a compiler, a checker and a few other tools.

The retraining script outputs various files: checkpoints, labels, graph, weights etc. We piped the final protobuf file (.pb extension) to the NCSDK’s compiler:

mvNCCompile retrained_graph.pb -in=Placeholder -on=final_result -o retrained.graph

and…

Toolkit Error: Stage Details Not Supported: VarHandleOp.

Not very useful, but it was something. If you want to know more about this issue follow our conversation on the ncsforum.

The first suspects were unsupported layers or variables that were added during the retraining phase. But it was not clear which layer(s) it could have been. While exploring the NCS forums, somebody suggested that specifying a different input layer worked for them. Taking the hint, we tried various input layers, and the following compilation worked:

mvNCCompile model.pb -in=input/BottleneckInputPlaceholder -on=final_result -o retrained.graph

Success was inspiring for a moment there, but soon enough the limitations were obvious. The input tensor dimensions were (1, 1, 2) which was trouble.

However, it was clear after the result that it was surely a matter of unsupported layers that was causing the NCS compiler to fail.

We modified the script to give the input layer a specific, concrete name — so to speak — and tried to compile. However, this time we had changed the NCSDK’s code to enable debug messages as suggested by a nice fellow on the forums. Here is what the compilation said:

Toolkit Error: Stage Details Not Supported: Top Not Found module_apply_default/hub_input/Sub

Well, that is more information than the previous error message (VarHandleOp not supported). It looked like something being inserted by Tensorflow Hub APIs was causing the problem.

We thought that may be, just in case, it was some variables which needed to be frozen, and graph had to be transformed for deployment. So used Tensorflow tool’s graph_transforms:

transform_graph \

— in_graph=retrained.pb \

— out_graph=optimized_retrained.pb \

— inputs=’Placeholder’ \

— outputs=’final_result’ \

— transforms=’

strip_unused_nodes(type=float, shape=”1,299,299,3″)

remove_nodes(op=Identity, op=CheckNumerics)

fold_constants(ignore_errors=true)

fold_batch_norms

Fold_old_batch_norms’

We took the optimized graph, and compiled it. With no luck, of course!

You should know, that up to this point, we had not ported any other tensorflow models, not even the default ones successfully. Because we had not even tried that! Our bad, and so we did just that.

Movidius guide says it supports many pretrained Tensorflow models out of the box with some commands (see the link for details). And sure enough, InceptionV3 was converted to NCS supported graph. It was not retrained model, but it was a good sign.

Then we decided to follow the example given by the guide to port MNIST model. We studied it and performed all the instructions. The resulting graph was indeed supported by the NCSDK, and we could compile it.

We decided that we need to stop using Tensorflow Hub, and write the process from scratch. In the process we discovered Tensorflow for Poets which does not use hub in the process. This saved us the effort of writing the training script from scratch.

You can study the retraining script, which closely resembles Tensorflow’s retrain script. But it differs in some crucial aspects. It does not add the same layers, and it does not use Tensorflow Hub anywhere in the process. This gives us much better control over the training process.

As a cursory check, we tried to retrain on InceptionV3 with this new script and port it with NCSDK. You know what? It failed. Yes, it failed!

We do not give up just like that, oh no. What we did instead, was to retrain on MobileNet with the new script. This time, it worked. We could compile the graph file:

python -m scripts.retrain — bottleneck_dir=tf_files/bottlenecks — how_many_training_steps=4000 — model_dir=tf_files/models/ — summaries_dir=tf_files/training_summaries/tmp — output_graph=tf_files/retrained.pb — output_labels=tf_files/tmp.txt — image_dir=training_data/

transform_graph — in_graph=retrained.pb — out_graph=optimized_retrained.pb — inputs=’input’ — outputs=’final_result’ — transforms=’

strip_unused_nodes(type=float, shape=”1,224,224,3″)

remove_nodes(op=Identity, op=CheckNumerics, op=PlaceholderWithDefault)

fold_batch_norms

fold_old_batch_norms’

We took optimized graph, and compiled it:

mvNCCompile optimized_retrained.pb -in=input -on=final_result -o retrained.graph

The final generated graph is usable on the Movidius stick.

We tried using MobileNet with the Tensorflow Hub script, but the resulting graph could not be ported with NCSDK.

We used various versions of MobileNet as base networks and measured performance. Here are some of the data:

Image for post

Concluding Remarks:

  • If you want to use pretrained InceptionV3 on Movidius, you can easily use it without any fuss.
  • We could not port custom trained InceptionV3 model to run on Movidius stick.
  • Retraining using Tensorflow Hub seems to make the models incompatible with NCSDK.
  • MobileNet retrained without Tensorflow Hub is a good option to run the models on Movidius.
  • There are clear performance observations which can help decide which model suits your needs.
  • We suggest you go through our discussion on ncsforum to get some technical idea of the process.
]]>
https://13.233.195.217/how-to-port-a-custom-tensorflow-model-to-movidius-ncs/feed/ 0
Digital meter reading using CV&ML https://13.233.195.217/digital-meter-reading-using-cvml/ Thu, 18 Apr 2019 08:30:33 +0000 https://18.224.111.186/?p=2824 Image for post
Photo by Diana Feil on Unsplash

Some of the legacy digital or analog meters used on the field are either too difficult or costly to replace. For example, old water/gas pipeline meters. Detecting pressure continuously is important for prevention of escalated events. We were looking for non-interventionist solutions and decided to experiment with CV+ML techniques. To start with we picked up digital meter reading. This blog post is part of a series of experiments about which we will update in subsequent posts.

Our tool set included: Raspberry Pi, PiCam and Movidius NCS.

Approaches considered

  1. Tesseract OCR
  2. Attention OCR
  3. MNIST

There are a few options to approach this problem. They can be separated into two broad categories: Optical character recognition and Deep learning. Many libraries of the recent time have started using deep learning based model as a part of the overall OCR approach. We evaluate two of the OCR and one deep learning approaches.

Evaluation results

Tesseract OCR

Straight up, we gave the library an image of a real world meter’s screen to get a fleeting idea of how it works. Of course, it could not read almost anything, given the background condition and the 7 segment LED letters.

Then, we tried giving an image of machine printed text taken by a phone’s camera. It produced such garbage, that we knew we had to preprocess the image. And so, we did that. The result was much better, but it still missed a few things. With better preprocessing, we managed to get 100% accuracy.

With the experience gained, we gave it 7 segment LED images of individual digits, and it was practically hopeless. We could have modified the algorithms to read the LEDs, but that is a project of its own.

Below are the results of some of the images:

Image for post
Image for post
Image for post
Image for post

Attention OCR

It gave us errors that we were too lazy to handle. It is not a good thing to see an error as the first output!

MNIST

The only approach left was the deep learning based model trained on MNIST Database. Since we had previous experience with it, we started out simply — by giving it images of handwritten digits. Which, of course, it recognized perfectly.

Encouraged, we gave it an image with multiple digits of 7 segment LED. Failure was that the model had never seen 7 segment digits! Of course, the database consists of human written digits.

Our first hypothesis was that it is being tricked by the cuts between the segments; and that causes it to think these are different digits all together.

So we prepared some images which had simple vertical and horizontal lines without those nasty cuts. Well, it at least recognized something tangible, was not accurate at all.

Image for post
Image for post
Image for post

Custom trained MNIST (small dataset)

We then retrained the model with 1152 images (digits 0 to 9) dataset. The dataset was very small, and consisted of terribly biased data.

Even with this small training step, we could see the model improving!

Image for post

Evaluation results

Considering the evaluation results, we concluded that we can train MNIST to do much better digit recognition, for the task we have.

]]>
Redesgining bookmarks for more than pages https://13.233.195.217/redesgining-bookmarks-for-more-than-pages-2/ https://13.233.195.217/redesgining-bookmarks-for-more-than-pages-2/#respond Sun, 17 Apr 2016 14:58:50 +0000 https://lappamtest.wordpress.com/2016/04/17/redesgining-bookmarks-for-more-than-pages/


Paperlux, a design studio in Hamburg, Germany is reinventing the bookmark as we know it. This redesign was made forArjo Wiggins, a company that produces paper products.



Conventional bookmarks only allow you to mark a page of a book that you’re reading, but most often you have to browse through the page to get back to the line you were reading.


These new bookmarks can also mark the lines where you paused reading so that you can get back real quick.

Now, this is some cool design hacking that solves a small but obvious problem.

[via Fubiz]

Originally Published on November 6, 2013

]]>
https://13.233.195.217/redesgining-bookmarks-for-more-than-pages-2/feed/ 0
Hack a Fabric into a women’s purse https://13.233.195.217/hack-a-fabric-into-a-womens-purse/ https://13.233.195.217/hack-a-fabric-into-a-womens-purse/#respond Sun, 17 Apr 2016 14:55:08 +0000 https://lappamtest.wordpress.com/2016/04/17/hack-a-fabric-into-a-womens-purse/


Here’s something for fabric lovers to get started with their old denim or fabric which is difficult to throw away. You can turn any piece of cloth into utility as shown in this tutorial.


The quick tutorial on wikiHow, is created by multiple collaborators and can be found here. There are also tips on how to make it look prettier.

An interesting hack would be a men’s wallet using denim. Any takers?

[via wikiHow]

Originally Published on November 3, 2013

]]>
https://13.233.195.217/hack-a-fabric-into-a-womens-purse/feed/ 0
Dancing Drop: Oscillation of a water drop in an acoustic field https://13.233.195.217/dancing-drop-oscillation-of-a-water-drop-in-an-acoustic-field-2/ https://13.233.195.217/dancing-drop-oscillation-of-a-water-drop-in-an-acoustic-field-2/#respond Sun, 17 Apr 2016 14:45:16 +0000 https://lappamtest.wordpress.com/2016/04/17/dancing-drop-oscillation-of-a-water-drop-in-an-acoustic-field/ Water and sound are great friends as seen here in this project by two students who have created start patterns using a drop of water and ultrasonic acoustic field.

The way water behaves in an acoustic field is not entirely new, as seen here. It is also known that water can be levitated using an ultrasonic transducer.


In the following project, the drop was flattened into a levitating disc by applying an ultrasonic standing wave of increased field strength.


Star shaped oscillations were created from the flat disc of water drop by modulating the strength of the field so that it matches the resonant frequency of the drop.

The points on the ‘star drop’ are same as the harmonic being matched.




You can find more details about this project under the title ’Shape oscillation of a levitated drop in an acoustic field’ in arXiv.

[via arXiv]

Originally Published on October 27, 2013

]]>
https://13.233.195.217/dancing-drop-oscillation-of-a-water-drop-in-an-acoustic-field-2/feed/ 0
Google Datasets to tinker around Data processing and Visaulizations https://13.233.195.217/google-datasets-to-tinker-around-data-processing-and-visaulizations/ https://13.233.195.217/google-datasets-to-tinker-around-data-processing-and-visaulizations/#respond Fri, 15 Apr 2016 18:17:25 +0000 https://lappamtest.wordpress.com/2016/04/15/google-datasets-to-tinker-around-data-processing-and-visaulizations/


For the benefit of the community, Google has released various datasets over years of data collection & scaling and training corpora of public web pages.

Some of them are,

  • Co-occurrence of words for word n-gram model training
  • Job queue traces from Google clusters
  • 800M documents annotated with Freebase entities
  • 40M disambiguated mentions in 10M web pages linked to Wikipedia entities
  • Human-judged corpus of binary relations about Wikipedia public figures
  • Wikipedia Infobox edit history (39M updates of attributes of 1.8M entities)
  • Triples of (phrase, URL of a Wikipedia entity, number of times phrase appears in the page at the URL)

These data sets are interesting, especially for those interested in n-grams for computational linguistics and probability. This is a real delight for anyone looking into getting started with data and visualizations. All the data you can get!

You’re invited to tinker around and collaborate on projects related to large-scale data processing, data driven approaches or visualizations. Come up with cool ideas, collaborate with information designers and tinker around during this weekend atDevthon.

The aggregated links to the data sets can be found here. More links can be found in the Hackernews discussion.

[via Daily Learnings]

Originally Published on September 30, 2013

]]>
https://13.233.195.217/google-datasets-to-tinker-around-data-processing-and-visaulizations/feed/ 0
Tinkering with Spatial Data to solve a Geography puzzle https://13.233.195.217/tinkering-with-spatial-data-to-solve-a-geography-puzzle/ https://13.233.195.217/tinkering-with-spatial-data-to-solve-a-geography-puzzle/#respond Fri, 15 Apr 2016 13:57:57 +0000 https://lappamtest.wordpress.com/2016/04/15/tinkering-with-spatial-data-to-solve-a-geography-puzzle/


Todd Schneider, who went on a recent road trip has used the R project to work on an interesting spatial challenge that finds out the most “concave” state in the US. The R project is a free environment for statistics and graphics.

The problem statement was to find the 2 points such that, a) both the points are in the same state, b) a straight line connecting them crosses most number of states.

He worked around to decrease the number of sample points of boundary around a state to keep the data enough to conclude on the solution.

This project can also be extended to determine the most concave state in other countries.

The data used was from the GADM database and can be downloaded freely. The R script is also available as a Github Gist.

An animation of the method used to calculate is shown below.

[via Rap Genius Engineering Team]

Originally Published on September 27, 2013

]]>
https://13.233.195.217/tinkering-with-spatial-data-to-solve-a-geography-puzzle/feed/ 0