BI platform with AI and computer vision for a fashion retailer
Itransition delivered a BI platform for predictive analytics and helped the customer increase their buyer conversion rate by 8 percent, as well as reduce infrastructure costs by 50 percent.
Table of contents
Challenge
Our customer is an established ecommerce company providing best-selling brands of clothing, accessories, and home goods. The company ships these products worldwide, and have served over 20 million registered customers.
With more than 200,000 people using their website and mobile app daily, the retailer collects and processes large amounts of data in order to know their customers’ needs. To automate and simplify these processes and make informed decisions, the company decided to build a single platform that would collect user behavior data from their website and mobile app, sort and analyze it.
The retailer planned to use the platform to build predictive user behavior models to forecast buyer conversion rates, product interest, and future sales. Apart from that, the company wanted to increase their conversion rate while reducing their spend on the infrastructure management.
One of our past customers recommended Itransition as a software vendor with a vast expertise in ecommerce and, particularly, retail BI. Taking into consideration our track record of delivered business intelligence services, the retailer chose to approach our team for their BI platform development.
Solution
Having delivered several successful retail BI solutions operating around the globe, our team used their expertise to build a centralized BI platform. The solution gathers and analyzes data in a near-real-time mode and provides accurate customer data for further website and mobile app personalization. The delivered solution processes clickstream data, mobile data, server events, and data on email campaign engagement.
Platform Load | Â | Â | Â |
---|---|---|---|
10 TB of data processed |
8 million trackable events on the website and in the mobile application |
3.5 million emails in the system |
30 thousand events per minute on the website (on average) |
* Disclaimer: According to the Non-Disclosure Agreement, we cannot reveal the screenshots of the real system. Here we provide similar screenshots created to present an idea of the solution developed by Itransition
Solution architecture
Architecture Outlines | Â |
---|---|
Event Tracking Layer |
Tracking events from different sources (web, mobile, server, etc.) |
Event Collecting Layer |
Collecting both tracked events and operational data from the backend systems (ecommerce, CRM, etc.) |
Event Processing Layer |
Loading, normalizing, filtering, validating, and transforming collected data |
Data Storage Layer |
Storing data that is optimal for statistical analysis and machine learning |
Data Consumption Layer |
Building data marts with view and integration APIs to access the data |
Integration Layer |
Gathering data from third-party sources via connectors, adaptors, and ETL jobs |
The platform supports two major user roles: the retailers’ marketing team and subscribed members.
The marketing team can:
- Get data on user activities in every channel (website, mobile application, emails, surveys, etc.)
- Review more than 100 types of custom reports on the website usage and views, product orders, etc.
- Create ad-hoc queries based on the collected data when needed
Subscribed members can:
- Enjoy their own personalized versions of the website, the web application, emails, and push notifications, complete with banners, site categories, and product types tailored to their preferences based on previous views and purchases
AI-powered recommendation engine
To help our customer provide personalized experience to their online visitors and improve the accuracy of recommendations, we developed a recommendation engine using collaborative filtering. The collaborative filtering algorithm operates on implicit user feedback such as purchases, views, clicks, and other metrics coming from the website, mobile app, or emails.
We chose this methodology as it scales easily to process terabytes of data, so we could run it on more than ten machines at a time.
Our team opted for the alternating least squares (ALS) algorithm, initially used during the Netflix Prize challenge, as it met the project’s scalability and performance criteria. We also used a random forest regressor to predict product scores, calculated as the combination of item clicks, add-to-cart button clicks, and purchases.
Taking into consideration more than 20 million platform users and 9 million SKUs, we selected Apache Spark as the main ETL platform and Spark MLLib, based on the ALS algorithm, to run ML pipelines in production.
Computer vision for product image recognition
Our team also applied computer vision for a number of internal tasks:
- Automated product attribute detection in product images, including color, clothing type, patterns, neckline, sleeve length, etc.
- Automated color detection based on pre-filtering to remove irrelevant image parts, such as background and skin. We used edge detection, adaptive thresholding, and clustering in the CIELAB color space to reduce the number of color tones used in basic colors.
- Multi-attribute image classification. We used a ready-made ResNet-50 convolutional neural network built with Keras and TensorFlow, which was pre-trained on ImageNet with a set of over 100,000 product images.
- Image similarity search developed with the same architecture as the multi-attribute classification. We added an additional embedding layer to the CNN to enable the nearest neighbor search (NNS).
QA & Testing
Whenever we develop retail BI solutions, we always prioritize quality assurance so that the final solution meets the quality and performance requirements. Therefore, our dedicated QA engineers has been performing ongoing performance testing of the deliverables throughout this two-year project.
The performance testing allowed the testing team to detect multiple stability issues and several critical defects in the components built on top of Apache Hive and Apache Storm, such as a memory leak causing out-of-memory (OOM) errors. Itransition eliminated all the issues and defects by moving the platform to a new technology stack, which helped improve the overall performance of the solution.
To ensure a stable, predictable, and timely delivery process during the retail BI development, our team applied continuous integration and delivery (CI/CD) practices with continuous code review and quality assurance.
Technologies & tools
One of the project’s key goals was to make the solution painlessly scalable while reducing the overall infrastructure costs. Initially, the solution was based on the Apache stack, including Kafka, Storm, and Hive Streaming. Together with the customer, we decided to host the solution on the Amazon Web Services (AWS) and build it with a serverless architecture.
This approach allowed the development team to make the solution easily scalable and fault-tolerant, as well as ensure the auto-scaling of the resources in place, minimize their idle time, and, as a result, reduce the infrastructure costs.
Itransition developed the data collection layer based on the Hortonworks Data Platform (HDP).
We also optimized costs associated with the Amazon DynamoDB management. With the application’s traffic alternating substantially during the day, it was very difficult to forecast and control it in order to effectively use the provisioned capacity mode. Therefore, we switched to the Amazon DynamoDB on-demand pricing with no planned capacity boundaries. It allowed the customer to avoid situations when the data storage capacity was under- or over-provisioned. The pay-per-request pricing helped the customer cut down the database management costs by almost 50 percent.
Another challenge that our team faced during this retail BI project was to reduce costs associated with training dockerized deep learning (DL) models. For example, the AWS ML platform SageMaker is expensive to train DL models on graphic processing unit (GPU) instances. It doesn’t support Amazon EC2 Spot Instances, which bring the benefit of spare computing capacities available in the AWS Cloud. With EC2 instances, it’s possible to rent virtual servers (instances) for the needed amount of time and then request an on-demand instance. To overcome this pitfall, our team developed a custom framework that allowed building TensorFlow-based models, dockerize them and deploy to EC2 Spot. As a result, we were able to save around 50 percent of costs.
Itransition also integrated the platform with third-party solutions and tools, including:
- Salesforce Marketing Cloud to manage email campaigns
- LiveIntent to support real-time advertising
- SurveyGizmo to create surveys and make informed decisions based on the results
- Evergage to collect and analyze user behavior in real time
Results
Itransition delivered a retail-specific BI platform for data collection and analysis, helping the customer understand online user behavior better and increase sales through AI-powered personalization.
- The visitors-to-buyers conversion rate increased by 8 percent owing to personalized communication
- The volume of the collected user data, required for building accurate predictive user behavior models, increased by 15 percent
- Monthly infrastructure costs dropped by 50 percent
Services
Customer experience consulting
Partner with Itransition to supercharge your customer centricity with customer experience software made for your exact needs.
Services
Retail data analytics
Check out Itransition's take on retail data analytics use cases and best practices, along with our range of development and consulting services.
Case study
BI solution enhancement for InsightSoftware.com
InsightSoftware.com relies on Itransition’s expertise to evolve its market-leading reporting solution — InsightUnlimited™.
Case study
Web performance optimization for an online retail chain
Learn how Itransition helped a leading European supplier with web performance optimization of their online supermarket.
Case study
BI consulting and engineering for a commercial bank
Find out about Itransition's high-profile BI consulting for a Canadian bank, including data architecture analysis and a BI strategy.
Case study
Cloud business intelligence system for vehicle manufacturers
Find out how Itransition migrated a BI suite to the cloud and delivered brand-new cloud business intelligence tools for the automotive industry.
Case study
Salesforce CRM implementation for a real estate company
Learn about Salesforce CRM implementation that helped a large real estate company increase their sales by 15% and shorten their sales cycle by 10%.