Alibaba Cloud Machine Learning Platform for AI: Image Classification by Caffe

Join us at the Alibaba Cloud ACtivate Online Conference on March 5–6 to challenge assumptions, exchange ideas, and explore what is possible through digital transformation.

By Garvin Li

The Image classification by Tensorflow section introduces how to use the TensorFlow framework of deep learning to classify CIFAR-10 images. This section introduces another deep learning framework: Caffe. With Caffe, you can complete image classification model training by editing configuration files.

Make sure that you have already read the Deep Learning section and activated deep learning in Alibaba Cloud Machine Learning Platform for AI (PAI).


This experiment uses a CIFAR-10 open-source dataset, containing 60,000 images with pixel dimensions 32 x 32. These images are classified into 10 categories: airplanes, automobiles, birds, cats, deer. dogs, frogs, horses, ships, and trucks. The following figure shows the dataset.

The dataset has already been stored in the public dataset in Alibaba Cloud Machine Learning Platform for AI in JPG format. Machine learning users can directly enter the following paths in the Data Source Path field of deep learning components:

  • Testing data: oss://
  • Training data: oss://

Enter the path, as shown in the following figure:

Format Conversion

The Caffe framework of deep learning currently only supports certain formats. Therefore, you must first use the format conversion component to convert the JPG images.

  • OSS Path Storing Images and Table Files: set this parameter to the path of the public dataset predefined in Alibaba Cloud Machine Learning Platform for AI.
  • Output OSS Path: user-defined OSS path.

After format conversion, the following files are generated in the output OSS path, including a piece of training data and a piece of testing data.

Record the corresponding paths for editing the Net file. The following is an example of the data paths:

  • Training data data_file_list.txt: bucket/cifar/train/data_file_list.txt
  • Training data: data_mean.binaryproto:bucket/cifar/train/data_mean.binaryproto
  • Testing data data_file_list.txt: bucket/cifar/test/data_file_list.txt
  • Testing data: data_mean.binaryproto:bucket/cifar/test/data_mean.binaryproto

Caffe Configuration Files

Enter the preceding paths in the Net file, as follows:

Edit the Solver file:

Run the Experiment

  1. Upload the Solver and Net files to OSS, drag and drop the Caffe component to the canvas, and connect the component to the data source.
  2. Set the parameters in the Caffe component, as shown in the following figure. Set the Solver OSS Path to the OSS path of the uploaded Solver file and then click Run.
  3. View the generated image classification model file in the model storage path on OSS. You can use the following models to classify images.

  1. To view the corresponding log, refer to Logview in Image classification by TensorFlow.


How ML differs from Statistics

Classical Statistics in University Under-graduate courses or even Graduate courses starts with descriptive statistics and then moves into distribution fitting and then all the way to complex multivariate analysis. Essentially covering hypothesis testing, correlation, regression , factor analysis and Principal Component analysis. 
 Statistics assumes a lot of a-priori knowledge about the data and its properties and does not necessarily cover a lot of trial and error or even tinkering.

Machine Learning in new age looks at wide array of techniques and algorithms which themselves learn from the data. Deep Machine Learning, Supervised Learning and Reinforcement Learning covers very interesting algorithm which learn themselves from the wide array of data. So data becomes input and model becomes output. This happens without any human intervention ( except in supervised learning). This is the real beauty of ML over conventional statistics. Although new age ML ( covering CNN/Deep Learning/Reinforcement Learning) draws a lot from statistics, cognitive biology, neuroscience, mathematics and control theory, most of the ML applications have been very new and have large technical and business impact.

In Reinforcement Learning classical optimization functions are used and behaviorism invested in psychology by Skinner comes int play in terms of “reward and punishment”. So behavior of the RL Algorithm is shaped in the same way a child’s behaviour is shaped by parents. Eventually use of Dynamic Programming from the classical optimization ( Operations Research) is used along with Bellman’s optimality conditions and MDP ( Markov Decision Process)

RL ensures that you can start “learning” with minimum domain or problem knowledge. Algorithm has power to learn and come up with its parameters depending on the error conditioning and reward optimization. Multiple algorithms like Temporal Difference Learning, Deep ! Learning and Actor Critic Methods ( A3c) ensure that algorithms in RL have power to create truly domain independent ways to learn in many many new domains without need to have domain knowledge.

ML Tribe( collection of AI Scientists, Data Analysts, ML practitioners, Students, Professors and Industry Professionals) is significantly different from old school statistics in many ways. Statistics assumes a lot of knowledge about the system. Statistical thinking in many ways is top-down, a-priori thinking. ML( Broad umbrella of algorithms in RL, Deep Learning) thinking is inherently is posterior, does not assume much and is bottom-up. In many ways as Richard Dawkins puts it “ The Darwinian thinking is mindless, purposeless bottom-up processes involving R&D, Trial and Error and Tinkering all the way”. ML resembles our own biological evolution. The same way as biological evolution ML algorithms are also evolving. The big advantage is ML algorithms evolution is much faster unlike biological gradual, slow evolution.

ML works a lot like biological processes seen elsewhere in nature. Sometimes ML does not necessarily try to Optimize in the classical Optimization Sense ( finding the best possible solution from large scale solution space). ML tries a process of sophisticated tinkering which moves from finding one sub-optimal solution and then move ahead. This process ensures continuity in learning as well as learning becomes in many ways autonomous.

Statistics used to need a lot of careful sampling, sometimes meticulously planned data cleaning would pre-date a rigorous statistical analysis. ML works with existing data and tries to create inferences.

ML v/s Statistics

One of the families of ML algorithms, Bayesian Inferencing using basic Bayes Probability coupled with state-space generators like Monte Carlo simulation so that you create simulated data where data is non-existent or not accurate. ML algorithms this way build a kind of robustness against the Data Quality problems.

Video Classification with Deep Learning

Problem Statement

Imagine that you have tremendous amount of videos and you would like to classify them based on what occurs inside, and of course you don’t want to hire people to sit in front of the computer and do the jobs for you 🙂 That is an option but highly expensive and error-prone.


  • There are both spatial and temporal content to be considered. Yes, a video consists of lots of images being viewed one after the other and each frame has a meaning but the order is important too. Would it be meaningful to reorder the images and view the resulting video? Probably not!
  • Who will process that many of frames? Do we need to process each and every frame to make assumptions about the content of a video? What if you only watch every 10th frame?
  • Training effort will be huge! Video classification is not a simple task. Apart from labeling training data, the architecture and hyperparameters of an optimum neural network will demand vast amount of resources.

Attacking the Problem

Ok, we are clear about the problem and challenges. It is now time to think about what can be done. What I will list below is by no means an exhaustive list but will give you enough perspective.

  1. Create a deep neural network with neurons processing each and every pixel of frames as separate features.
  2. Choose a Convolutional Neural Network (CNN) to decrease the number of features to be processed. Nearby pixels do not carry independent characteristics after all.
  3. Utilize a Recurrent Neural Network (RNN) to capture the order between frames for better classification.
  4. Construct a hybrid of a CNN and RNN.

What I Will Demonstrate

I will go with the 4th option above. I do not want to go into the never-ending training and testing cycles of a huge network trying to process every pixel. That is where a CNN comes in. Moreover, I also do not want to lose the temporal information hidden inside the videos.

In terms of the technology stack, I preferred TensorFlow. We will not interact with TensorFlow directly, though, as it will require many many lines of code. That is where Keras comes into the picture.

We will also not build the CNN part from scratch but instead do some transfer learning with an Inception v3 CNN.

Okay, here are the steps we will follow:

  1. Extraction of image frames from videos
  2. Training the top layer of an Inception v3 CNN with the input images
  3. Extraction of a sequence of images from videos with a constant size and equally spaced
  4. Training an LSTM RNN for classifying videos based on the image frames


I obtained dataset from This dataset has 13320 videos assigned to 101 action categories.

Example Videos

Folders were used for assigning the video files to their respective categories. Each different subject within videos are assigned to a group:

Folder Names Represent Categories

File naming convention for videos is as follows:

  • Sample Name
  • Category
  • Group Number
  • Index number within the Group

Considerations for Training/Test Data Split

  • A group cannot span across datasets
    Videos within the same group were recorded for the same subject. For example, if the category is YoYo, the same person was recorded within the same group. Therefore, using videos from the same group for both datasets will give a high accuracy but that will not be realistic.
  • Regular Expression for Extracting the Group and Index Numbers
  • Folder Name as the Category Name
  • Groups to be Shuffled During Assignment
  • Same Group Assignment for both CNN and RNN Training
    If a group is assigned to one dataset during CNN training and another dataset during RNN training, then the results will not be healthy because one of the networks would have seen that input in the training but it will be used for validation in the other network.

Inception v3

I will use Inception v3 CNN for transfer learning so it is worth to give some introduction for those new to the idea.

  • What is an inception network?
    An inception network consists of multiple inception blocks chained together.
  • What is an inception block?
    A single inception block tries to find the perfect combination of CONV blocks with different sizes in addition to a MAX POOL layer.

Inception Block

Inception Network

How I Fed Images into Inception v3

  • Every 25th frame is chosen for input to decrease the amount of data.
  • Extracted frames are written into a directory structure starting with the type of dataset.
  • 95% of the input data goes for training and the rest for validation.
  • All input image will be first passed through the CNN up to the top layer. The last layer dimension is 6 x 8 x 2048:

mixed10 (Concatenate) (None, 6, 8, 2048) 0 activation_86[0][0] mixed9_1[0][0] concatenate_2[0][0] activation_94[0][0]

  • Data shape for the training data features and labels:
    ((85807, 6, 8, 2048), (85807, 101))
  • Data shape for the validation data features and labels:
    ((7734, 6, 8, 2048), (7734, 101))
  • Training the top part with a Dense layer of 256 neurons, a Dropout layer with 0.5 probability, and a softmax layer:
    model = models.Sequential()
    model.add(layers.Dense(256, activation=’relu’, input_dim=6 * 8 * 2048))
    model.add(layers.Dense(101, activation=’softmax’))
  • Epoch size of 30 and batch size of 64 were used.
  • Accuracy and loss graphs for training and validation data:

Training and Validation Charts
  • The main objective here is to train the dense layer and not to achieve the highest accuracies possible.
  • Extracting the dense layer out of the trained network and concatenating that to the Inception v3 CNN as the top layer:
    model2 = models.Sequential()
  • Viewing the summary of the final CNN:
    Layer (type) Output Shape Param # ================================================================= inception_v3 (Model) (None, 6, 8, 2048) 21802784 _________________________________________________________________ flatten_1 (Flatten) (None, 98304) 0 _________________________________________________________________ dense_1 (Dense) (None, 256) 25166080 ================================================================= 
    Total params: 46,968,864 Trainable params: 46,934,432 Non-trainable params: 34,432

Custom RNN

Enough details about CNN. Let’s turn our attention the RNN network now.

  • A fixed sequence length (80) is used.
  • Sequences are extracted equally distanced.
  • Each image is first fed into the CNN to have the feature length of 256.
  • Data shape for the training data features and labels:
    ((11283, 80, 256), (11283, 101))
  • Data shape for the validation data features and labels:
    ((1008, 80, 256), (1008, 101))
  • Architecture has an LSTM of 200 neurons and a softmax layer:
    model = Sequential()
    model.add(LSTM(200, input_shape=(80, 256)))
    model.add(Dense(101, activation=’softmax’))
  • The network can learn the the features from the training set very easily so there is no need to add more layers and further complicate the architecture.
  • Epoch size of 30 and batch size of 64 were used.
  • Accuracy and loss graphs for training and validation data:

Training and Validation Charts
  • The network could achieve 100% accuracy for the training set very quickly but got stuck at around 72% — 73% for the validation set.

My Thoughts

  • Accuracy values for the attempted solutions with regards to UCF101 dataset are around 70–75%.
  • This suggests that the training data is not rich enough to grasp the essential features from the videos so as to predict the ones in the validation set.
  • Using different architectures with various neuron sizes also did not improve the validation accuracy, which is again a sort of proof that the above statement holds.
  • As mentioned earlier, the frames within a video are usually very similar in content and therefore increasing the RNN sequence length did not add value, too.
  • Although it is a very labor-intensive task, acquiring rich content video with enough size will give satisfying results with this hybrid architecture.

You Prefer to Watch Instead of Reading?

Well, here is my Youtube video that is a live explanation of this study:

Technologies you need to know about for Artificial Intelligence

Artificial intelligence (AI) is perhaps the secret ingredient to every major advancement in the fourth industrial revolution.

From virtual assistants like Apple’s Siri to Google’s self-driving vehicles, to even biometrics and speech recognition programs, there’s no end to the application of AI.

Based on mimicking the human thought process, this disruptive technology has successfully infiltrated every facet of our lives. According to a recent publication from Brooklyn University, AI will play an important role in the future of most sectors, from Economics to Politics, and even to Crime.

This disruptive technology, however, is still evolving despite massive adoption in various industries. Yes, most AI technologies are still based on algorithms that respond based on pre-set user behaviours, which limits its true purpose. Therefore, there’s a need to improve the algorithms that make up the neural network of every AI technology out there. No doubt, the drive has spurred the innovation and subsequent adoption of technologies or software to transform the landscape of AI.

To help you navigate this landscape better, I’ve collated prominent AI technologies with the most potential to effect change. Therefore, you will not only become familiar with the nuances but also explore real-life use cases of these technologies.

Machine Learning

DDI Editor’s Pick — Machine Learning Foundations: A Case Study Approach

Machine learning is a dominant aspect of AI that focuses on the ability of machines to learn and make accurate decisions by using large amounts of input data. This technology utilises the vast amount of data gleaned from a myriad of IoT devices. It is able to utilise these vast amounts of data to perform tasks such as visual perceptions and speech recognition, which are considered to require human-level intelligence.

Unlike other AI technologies, machine learning utilises a combination of algorithms and data. Although much emphasis is placed on its data usage, its uniqueness lies in the ability to learn patterns and automatically create new and dynamic data. Furthermore, machine learning creates a feedback loop which enables it to produce more models without requiring additional resources.

It’s no surprise that mega-corporations such as Coca-Cola and Heineken have taken advantage of this technology to improve their operations, advertising, customer and marketing service. For instance, Coca-Cola leveraged machine learning to launch the Cherry Sprite. By collecting vast amounts of data gleaned from their soda dispensaries, this company was able to identify a vast market for the Cherry Sprite!

Natural Language Processing

By 2025, the global AI market is projected to hit a record high of $60 billion. Do you know the interesting part? A large percentage of this figure is expected to be derived from Natural Language Processing (NLP) technologies. From Amazon’s Alexa to Google’s Assistant, this speech-to-text technology is fast-becoming a constant in every aspect of society. It’s an advanced form of AI that helps machines to understand and, perhaps, even communicate with human speech. Let’s use Amazon’s Alexa as a case study: Alexa’s designers were able to take NLP a step further. Alexa uses a multi-layer communication system that spans across audio cues, screen, Alexa’s voice, and apps. In addition to this, it utilises the Hidden Markov Models (HMM) to understand human language and the context in which it’s used.

Therefore, NLP technology is able to break down human language into parts of speech via a sequence of coded grammar regulations in order to understand the context of the language. Besides its massive application as a virtual assistant, this piece of technology is also used in data mining and fraud detection.


Passwords are highly vulnerable. In fact, they are often regarded as the weakest security link in an organisation. For this reason, Biometrics was developed to ensure a natural interaction between machines and humans by utilising fail-proof authentication criteria such as DNA, fingerprint, dental structure, and facial structures.

More so, it provides a faster form of identification than magnetic strips or passwords. It’s no surprise that a survey conducted on 4000 customers, revealed a 52% preference for biometric methods over traditional security protocols. Therefore, companies like Samsung and Apple have taken advantage of this AI technology to garner more subscribers for their products. No doubt, biometric systems are of indispensable value in various sectors. For instance, Government agencies use biometric systems in voter registration, ePassport, National ID, and Border Control. In addition to this, it offers a safe and more efficient method of identifying their citizens without requesting physical ID tags at all times.

Business Decision Management Framework

Companies take advantage of the vast repertoire of data at their disposal to make informed decisions to connect more with their target audience. No doubt, these decisions become more accurate when infused with AI. The effect of AI on decision management is immensely felt in eCommerce, insurance, and financial marketing trading.

This business decision management framework incorporates the design, building, and management of automated systems for better decision making. Companies are able to use it to manage their supplier, employee, and customer interactions in a bid to boost operational decisions. More so, mega-corporations like Amazon have taken this further by offering AI-inspired Decision Management services via their Amazon Web Services (AWS) Partner Network to companies and individuals alike. Frameworks such as the AWS Network enables businesses to connect with their target audience via up-to-date technical, business, and market support.

Robotic Process Automation

Here’s another AI technology currently revolutionising the workforce of most industries. In fact, commonplace for companies to employ the technology in areas where human labour is considered as expensive or inefficient. Robotic Process Automation (RPA) is a non-intrusive technology which leverages on existing infrastructure without creating a disruption to the system.

Concisely, this technology focuses on reducing cost without undermining efficiency or productivity. RPA robots can mimic many human user actions such as moving applications, fill in forms, copy and paste data, and extract data from documents. It’s no surprise that mega-corporations like PWC and IBM have integrated RPA to reduce cost while improving scalability, control, and quality. In fact, according to IBM’s analysis report, companies that use RPA for paying accounts, process invoices at a faster rate of 43% as compared to non-RPA users.

Furthermore, such companies had a 40% cost reduction in their operations. Mind you, RPA technology solely relies on algorithms. Therefore, it is incapable of creating new experiences from its operations. This mode of operation is different from machine learning or biometric applications such as Apple’s Face ID, which utilises a combination of algorithms and data to create a new dynamic feedback loop.


As mentioned earlier, AI is constantly evolving due to an influx of new and disruptive technologies. These technologies are not only expanding the landscape of AI but also increasing our understanding of how the brain works. With enough research and Innovations, man will learn how to improve the neural network which is the core of every AI technology.

FPGA Research and Development in Nepal

FPGA are the re-configurable chip technology which are dominance in market of Electronic Hardware Design since 1990’s. FPGA(Field Programmable Gate Array) technology allow’s hardware engineer to design, test and implement different logic designs, architecture and processing systems. While talking about the global FPGA market it is becoming Multi-Billion Dollar industry according to marketsandmarkets. The main player’s of FPGA Market are Xilinx and Altera (acquired by Intel), aside of this two main bulls there are small players too which are Lattice Semiconductor, MicroSemi etc.

An Open Source FPGA-PYNQ FPGA from Xilinx which allows to implement design in Python (Source:

Digitronix Nepal is initiating FPGA Research and Development Initiative from 2015, while Digitronix Nepal worked on Electronic Hardware Design, Research and Development from 2013.

Why Digitronix Nepal is initiating FPGA R & D in Nepal?

Nepal (a developing Nation) which is only becoming a technology consuming market rather than a development center in terms of design and development however currently there are many Companies in Software Development in Nepal which are representing Nepal on global arena. The Electronics Design and Automation is wide market globally but Nepal is not able to harness this opportunity even a bit. So, Digitronix Nepal believe that Electronic Hardware Design can generate lots of opportunities for Nepalese Engineer’s and Professional’s as so we are working on Electronics Hardware Design field from 2013. While talking about FPGA Research and Development, We (Digitronix Nepal) believe that FPGA is the electronic hardware design platform where Designs and Intellectual Property (so called IP’s) can be marketed so we don’t need to make hardware ourselves.

StartUp Scene at New Business Age Magazine, May 2017 (Click for more)

What has happened on FPGA Research and Development Initiative until yet?

Digitronix Nepal has signed MoU (Memorandum of Understanding) with Nepal’s Top Engineering Colleges ,including IOE Pulchowk Campus, Kathmandu Engineering College, Himalaya College of Engineering, Kathford Int’l College of Engineering and Management, Sagarmatha Engineering College, National College of Engineering and Kantipur Engineering College for creating FPGA Research and Development center’s at respective colleges. While Digitronix Nepal also facilitate to have state of art resources; FPGA’s and Software Tools.

MoU between Digitronix Nepal and National College of Engineering

The hardware and software are utilizing in those Research and Development Centers for implementing new design methods, development of systems and researching on new ideas with FPGA’s. Digitronix Nepal also has assisted those center’s for Technology Transfer on this State of Art design environment.

Digitronix Nepal has collaborate with different engineering colleges of Nepal for organizing Seminar’s on FPGA Technology, FPGA Design Competition’s and Interaction Programs for enhancing skills and knowledge in Engineering Courses and Professional Companies. Some snapshots of this events are presented below:

News on First FPGA Design Competition, 2016 at Republica
News regarding to Second All Nepal FPGA Design Competition 2017 at Saptahik, Kantipur

Who are getting benefit from this initiative?

This Initiative provide chance to get skill set on latest and market leading technology which can be marketed globally. Engineering Faculties and Student of Electronics , Computer and Electrical Engineering are getting global skills here in Nepal. So enthusiast from those engineering streams willing to pursue carrier on FPGA, VLSI (Very large scale integrated) Design and ASIC (Application Specific Integrated Circuit) and who are willing to pursue further study on Electronic Engineering, Computer Engineering, Computer Science, Embedded System Design are getting huge benefits which can offer them better opportunities than they are having currently.

So then what are the opportunities in FPGA R & D field?

Globally there are many opportunities on FPGA Design Field while Design Skills of FPGA are also heavily applied for VLSI Design and ASIC Design so the overall area of opportunities is huge including FPGA Design, VLSI Design and Verification and ASIC Design. You can visit electronicsweekly, Indeed and many other job portal’s and Freelancing Sites (Upwork, Freelancer, fever etc.)

In Nepal, Digitronix Nepal is offering Internship’s and Job Opportunities on FPGA Research and Development. The Carrier Objective is Implementing Computer Vision Algorithm with Neural Networks and Machine Learning’s on FPGA and Design/Implementation of Real Time Video Processing.

Internship Offering on Machine Learning and Neural Networks at Digitronix Nepal (for more click here)

So what is Digitronix Nepal’s Services?

Digitronix Nepal is currently working on FPGA based IP (Intellectual Property) Design in Real time Image and Video Processing. We are offering our services on FPGA Design based on RTL Design (Verilog/VHDL), Intellectual Property (IP) design on Image Processing/Video Processing, Design and Verification, IP migration , PCIe based Design support and PCB Design etc.

Our Projects can be viewed at: Digitronix Nepal’s Project

Digitronix Nepal is also providing Offline Training’s and Online Training’s. We already have provide training to Faculties to Kantipur Engineering College, Khwopa College of Engineering and Students of Kathmandu Engineering College, Khwopa College of Engineering, Himalaya College of Engineering Etc.

On talking about Online Training’s, we are providing online training’s from where we have Six Courses on FPGA Design and Development.

Digitronix Nepal’s Online Courses at Udemy: Course Link

Thank You for reading this Article!

Feedback's, Comments and Suggestion's are heavily welcomed at: or Digitronix Nepal’s Facebook Page.

Artifical Intelligence and the Music of Herbert von Karajan (Keynote at Deutsche Telekom)

Michael Schuld (Deutsche Telekom), Matthias Röder (Karajan Institut), Michael Hagspihl (Deutsche Telekom)

However, the human perception of the world is restricted because, on the one hand, the capacity of our human body to absorb and process data is limited by the bandwidth of our inputs, and on the other hand, by the computational power of our brain’s neural networks. And by this limitation, we are ultimately limited as a being. We simply can not grow beyond our hardware.

But what if we could broaden our perception of the world, if we were able to internalize the experiences and memories of others as if they were our own? Imagine, if in a short space of time we could absorb the essence of life, an important experience, what another person, for example, has taken ten years or an entire lifetime to experience and acquire as knowledge.

Exactly this is what music does for us! It compresses human experiences in the highest form, in pure emotions, moods, and feelings. It does so through abstract structures — the musical works written down by composers — which are then played and bought to life by musicians. Codified experiences at the level of the musical work are thus enriched by the musicians with their experiences and feelings and then decoded, perceived and processed by us, the audience. What happens there is, something like a compressed world experience of a person, which can then be modelled by many people.

But what would happen if we take that to another level, so that this compressed world experience would not only be possible from one person to others, but from all people to all others? If, then, we came closer to the goal of understanding the totality of the musical world, in order to outgrow as a species far beyond what we as individuals can experience? Just as Google has set itself the goal of making the knowledge of humanity entirely accessible.

[Here I would ask you to please close your eyes and listen to the following music. Then, after a few seconds, open your eyes and look at the beautiful visualizations of Stephen Malinowski’s MusicEyes while listening to the music.]

Together with scientists from the KUG Graz, the University Mozarteum, the Johannes Kepler University Linz, MIT and Stanford, we investigate this question at the Karajan Institute. Our goal is not just to do basic music research, but to develop practical software applications for our customers that will allow us to grow the market for our products. We run data science for the classical music business! The musical interpretations of Herbert von Karajan are connected note by note with the symbolic musical notation of the composers. So we measure the time that elapses between each played note, how the sound quality changes between the notes and much more. Because Herbert von Karajan has recorded many works of music history more than once, each piece of music has a multi-dimensional matrix of data containing all the important information about the “HOW is this music being played”?

Now imagine that we are doing this not just for one work, but for all works in the recorded history of music. Imagine further that we not only read the interpretations of Herbert von Karajan, but also those of all other conductors that ever recorded music. The result is an overall picture of music making of the last 100 years, which now allows us to program business applications and new products in education, visualization and composition.

The goal of the whole undertaking, and so I come back to my initial question, is now to use machine learning on this data for systems of creative musical intelligence. Because we know how a particular note sequence was played in the history of music, we can also build systems that generate a “human” interpretation of previously unknown sequences.

Currently, over 300,000 musical works have been added to the Peachnote/Petrucci library, a database of musical works. Today, technologists and researchers can support the creative process of composers and musicians for example, with an auto-complete of melodies or harmonic sequences in a variety of styles. Or in a VR application where you can conduct yourself, the music of Karajan. The music follows exactly your instructions! For example, some music startups use this data for a piano accompanist who enhances the musical fantasies of the user in Mozart’s style.

But the applications are also moving into education, where virtual assistants are built to help young musicians practice, by listening, accompanying, and pointing out undesirable developments.

All these examples, and there is much more to discover, have one thing in common: they expand the creative potential of our society by giving each and every one of us, our colleagues and employees, the opportunity to grow beyond their own creative potential.

What is necessary for this? From my point of view, it takes a fresh look at the data streams that arise in our companies and in the interaction with customers and business partners. Let us explore the possibilities beyond proven methods and standards. At the Karajan Institute, we did this by putting managers, technologists, and musicians in one room and exploring the theoretical options without parameters. Unless you already do, set up your teams as creatively as possible, give leeway where standard operating procedures otherwise set the tone, and then pursue the ideas and potentials that come with all your business acumen. For us in classical music, this worked out wonderfully and I see no reason why it should not work in other parts of the economy as well.

Matthias Röder is a specialist in data science and machine learning in the field of music and media. After studying music at the Mozarteum University Salzburg, he received his doctorate in 2010 from Harvard University in the USA. There he dealt intensively with questions of artificial intelligence and creativity and founded the world’s first Digital Musicology Research Group. Since 2012 he is CEO of the Herbert von Karajan Institute where he continues the legacy of the most prominent conductor of the century, using latest technologies. Herbert von Karajan is more successful than the Rolling Stones, Madonna or Michael Jackson with over 300 million records sold.


In previous posts we’ve discussed interpreting residual plots when evaluating linear models ( here, here, here, and here). But what is a residual? A residual is the distance that a given data point is from the regression line. In other words a residual is our prediction error for that data point. A smaller residual indicates a better fit to that data point.

A simple plot will demostrate:

plot(dist ~ speed, data = cars)
abline(lm(dist ~speed, data = cars))

The yellow highlights are the residuals; the shorter lines indicate smaller residuals. Points below the regression line will have a negative residual while those above the line will have a positive residual. The sum and mean of all residuals is always zero.

We can evaluate how well the model fits overall by evaluating the Root Mean Squared Error, which is the standard deviation of all of the residuals. We’ll cover that topic next.

Google Digital News Initiative Grant for Developing MorphL: AI-Driven UI

MorphL is a machine learning platform that empowers digital publishers to optimize engagement and conversion rates by means of predicting how users will interact with various UI elements.

The platform will record various UI micro-metrics and it will automatically test different variations to identify the optimum combination that produces the best results. By doing this, it’s like having a 24/7 in-house R&D department keeping the application’s UI always relevant and engaging, allowing digital publishers to focus on what they do best.

MorphL introduces a shift in the mindset and work process of digital publishers: the intrinsic ability for an application to assess how a particular UI element is impacting the engagement/conversion rate and automatically adapt to user behavior.

The Digital News Initiative (DNI) is a partnership between Google and publishers in Europe to support high-quality journalism through technology and innovation. Since 2016, the DNI Innovation Fund evaluated more than 3,000 applications, carried out 748 interviews and offered more than €73m in funding to 359 projects in 29 European countries.

The 50,000 EUR grant from Google DNI Fund comes as a confirmation of the platform’s potential to impact the future of UI development in the digital publishing space and that we’re entering a new era of UI development, one that will be impacted by AI (like many other aspects of our lives).

The project will be developed by our team at Appticles (multi-channel mobile publishing platform) in partnership with PressOne (an independent Romanian digital news publication) and we’re going to post updates on our progress right here on Medium, but you can also keep in touch by following us on Twitter & Facebook and star us on GitHub.

Deep learning frameworks and vectorization approaches for sentiment analysis

Introduction and Background

Data preparation

def clean_tweet(tweet_raw):    # REMOVE USER NAMES
tweet_clean = re.sub(r'@.*? ','', tweet_raw)
tweet_clean = re.sub(r'@\_.*? ','', tweet_clean)
tweet_clean = re.sub(r'http://.*?($
)', '', tweet_clean)
tweet_clean = tweet_clean.replace('"','')
tweet_clean = tweet_clean.replace("'",'')
tweet_clean = tweet_clean.replace(' ',' ')
tweet_clean = tweet_clean.replace('&lt;','<')
tweet_clean = tweet_clean.replace('&gt;','>')
tweet_clean = re.sub(r'^ +','', tweet_clean)
tweet_clean = tweet_clean.lower()
return tweet_clean
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 4, 125, 171, 15, 8, 2453, 466, 28, 2785, 2, 3, 71, 21, 92, 121, 135, 26, 8, 199]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 62, 128, 92, 60, 6, 31, 62, 6, 63, 29, 127, 127, 30, 6, 63, 62, 62, 92, 6, 90, 62, 93, 6, 129, 132, 6, 30, 28, 132, 63, 128, 59, 63, 6, 127, 32, 125, 129, 6, 125, 31, 6, 18, 125, 129, 78, 6, 128, 6, 28, 125, 27, 6, 63, 62, 6, 129, 131, 59, 28, 6, 90, 131, 92, 6, 31, 62, 92, 128, 60, 28, 31, 6, 64, 128, 31, 28, 6, 129, 132, 6, 90, 93, 128, 127, 92, 27, 63]

Model Preparation

model.add(Convolution1D(64, 3, border_mode='same'))
model.add(Convolution1D(32, 3, border_mode='same'))
model.add(Convolution1D(16, 3, border_mode='same'))



Character Level
Word Level (BoW)


You are irrational, read this to know why?

In the 1970s, two psychologists proved, once and for all, that humans are not rational creatures. Daniel Kahneman and Amos Tversky discovered “cognitive biases,” showing that that humans systematically make choices that defy clear logic. From these biases, one bias in particular — confirmation bias — might be especially difficult to overcome, according to a new study that sheds light on how human brains can trick us into getting things wrong.

What is confirmation bias?

Confirmation bias refers to the tendency of human beings to search for and favor information that confirms their beliefs while simultaneously ignoring or devaluing information that contradicts these beliefs. It is not natural for people to formulate a hypothesis and then test various ways to prove it false. Instead, it is far more likely that they will form one hypothesis, assume it is true and only seek out and believe information that supports it. Most people don’t want new information, they want new ways to validate old information.

How is related to machine learning and data analytics?

In predictive modeling and big data analytics, confirmation bias can drive an analyst towards seeking evidence that favors an initial hypothesis. For example, the analyst might frame survey questions in such a way that all answers support a particular point of view. Interpretation of information can also hold a bias. Two analysts can review the same data, but select different aspects of the data to support each of their individual preferred outcomes. Because people tend to remember information that reinforces the way they already think, memory also plays a part in confirmation bias.

Let me put this way, my dear readers say you traveled to Frankfurt and you think there are a lot of green mercedes cls on the road. Every time you see a green mercedes cls, you feel like your idea is confirmed. Every time you see any other car, it doesn’t devalue your belief. So even if green mercedes cls were a below average combination of car and color, you would never realize it. For your eyes , they’re everywhere. You saw three this week. You don’t remember seeing a single white mercedes cls on the street because you weren’t looking for them.

How is confirmation bias in real life?

Scientific American provided an experiment within their article to prove how confirmation bias affects individual political beliefs. The double-blind experiment was conducted by Drew Westen at Emory University. The study took place during the election season of 2004, with George W. Bush running on the Republican side and John Kerry running on the Democratic side. In his study, he took an MRI of 30 males — half of the males claimed to be strong Republicans and the other half claimed to be strong Democrats. They listened to statements from both candidates during the study and had to state their thoughts on the candidate’s statements. The study proved that the Republicans were more critical of Kerry’s comments and Democrats were more critical of Bush’s comments.

However, can the standing man rely on this study alone to prove the correlation? The study resulted in different emotional waves getting triggered in the brain, depending on which candidate the male-favored. This study is not enough to prove the correlation between politics and confirmation bias because it only sampled a small group of people. Also, everyone has different sets of beliefs. Are all of the people in this study as strongly opinionated as they say they are? Drew Westen did not report what kind of statements that were listened to for his experiment. Were one candidate’s statements more persuasive compared to the other? Westen conducted the experiment in a systematic manner, but the number of his subjects is too small in order for this study to be evidence that confirmation bias is found in politics. The way in which he could improve this experiment would be to increase and vary the number of subjects that partake in the study.

Thanks for reading. If you loved this article, feel free to hit that follow button so we can stay in touch.