From mass surveillance to fashion advice - can consumer AI benefit from surveillance research?

Jaroslaw Krolewski

synerise.com | basemodel.ai | cleora.ai | wislakrakow.com | agh.edu.pl

Published Mar 11, 2020

We've recently published a preprint of our paper with answer on that question.

In the last 3 years, there's been a flood of research papers in AI and Machine Learning published by Chinese universities and companies. Strong incentives from government and industry, combined with the challenges of scale (almost 1.4Bn citizens) push the frontier of research in China forward at an impressive pace.

Although the topics are quite diverse, there is a category of papers which are overrepresented when compared to western research institutions' output. The papers cover various topics, but their possible applications are common: mass surveillance. [10],[9],[2],[6],[1],[5],[8]

One of the most popular applications of AI, is that of Person Re-Identification - which is linking the photos/videos from CCTV cameras to identities of citizens. As you can imagine, the task is quite challenging, due to various factors:

cameras have different quality
angles, lighting & visibility conditions vary
people can be partially occluded
people change clothes, put on hats, can wear sunglasses etc.
people travel between different areas, so location information is of limited usability
human body is capable of many different poses and movements
much more variety is present in the real-world compared to synthetic datasets.

While tracking the progress made by China is scary, yet fascinating, can their state-sponsored research be re-purposed to directly benefit the end-user?

In our paper [7] we propose an approach adapted from mass surveillance, which with some modifications, outperforms all prior research in fashion visual search.

The problem of fashion retrieval / visual search sounds simple - given a user-made photo of a clothing item, automatically pick most similar clothes from a store's assortment. The user may take a photo of his/her friend, a photo of an item in a store, or upload a photo found in the Internet.

We decided to build a visual search product back in 2019, after having updated our visually-similar recommendation model (the problem of visually-similar recommendations is simpler, because it only considers stock-photos with little variation). When analyzing the state-of-the-art methods & models, we've noticed that the problem of fashion retrieval is very similar to Person Re-Identification for mass surveillance.

The fundamental problem being solved in both areas is that of representation learning - we want to encode images with vectors (called embeddings), such that:

identical or very similar objects have very similar vectors
very different objects have very different vectors

The 2 conditions above must hold despite changes in angle, lighting, object deformation, occlusion, crop & other confounding factors. There is more to representation learning in general than just images, but our problem is in the visual domain.

Intuitively, representation learning should "distill the essence of visual identity and similarity" of objects, and disregard all modifications & transformations of input, which do not change similarity or identity.

Some examples of transformations which do not change visual identity/similarity:

facial expressions - while they can deform the face, they do not change a person's identity
clothes deformability - a crumpled sweater is still the same sweater
lighting, brightness, contrast, etc. - objects remain the same, while they look different
view angles, rotations, focal length, image resolution - they change the photo, but have no effect on the objects themselves

Source: BU-3DFE dataset facial expressions

There are a lot more real-world transformations which can confuse ML models, but are naturally disregarded by people when evaluating "identity" or "similarity", e.g. weather conditions, mechanical transformations etc. Visual representation learning aims to be resistant to these transformations.

When it comes to fashion, we're interested in vector representations of clothes, where the same "fashion item" gets the same or similar vectors, regardless of where, when and how the photo was taken. In contrast, when it comes to mass surveillance, we'd be interested in vector representations of people, where the same person get the same or similar vectors, regardless of where, when and how the photo was taken, what the person was wearing and what pose they were photographed in.

The two use-cases sound so much alike, that it's quite surprising that very little intellectual cross-pollination has happened between these areas until now.

In our paper [7], we identify the similarities and differences between fashion and mass surveillance in depth. Then we successfully transfer latest research from Person Re-Identification to fashion retrieval for visual search.

While the Person ReID models require some adjustments to work well on fashion datasets, the final results are quite extraordinary. Our best approach outperforms all prior published research in fashion retrieval and establishes new state-of-the art results on two commonly used datasets - DeepFashion and Street2Shop. The best model described in the paper is a foundation of our Visual Search product at Synerise, trained on our massive proprietary datasets.

What's especially worth noting, is that our strong baseline model is much simpler than some of the recent fashion-specific approaches. The simplicity is apparent with regard to architecture, training procedure and computational resources required.This should serve as a reminder that good foundations, proper abstractions and picking the right problem to solve are often key to unlocking significant progress in research. As unlikely as it sounds, fashion and surveillance have a lot in common when thinking in the framework of representation learning.

Here are some example results of our best model:

For more details & nice pictures check out our paper with the appendix: [7].

Jacek Dąbrowski / Jarek Krolewski

[1] Dong, C. et al. 2019. DeepMEF: A Deep Model Ensemble Framework for Video Based Multi-modal Person Identification. Proceedings of the 27th ACM International Conference on Multimedia (Nice, France, Oct. 2019), 2531–2534.

[2] Guo, Y. et al. 2019. Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection. 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) (Sep. 2019), 1–5.

[3] Kuang, Z. et al. 2019. Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid. arXiv:1908.11754 [cs]. (Aug. 2019).

[4] Kucer, M. and Murray, N. 2019. A Detect-Then-Retrieve Model for Multi-Domain Fashion Item Retrieval. CVPR Workshops. 10.

[5] Nie, J. et al. 2019. Understanding personality of portrait by social embedding visual features. Multimedia Tools and Applications. 78, 1 (Jan. 2019), 727–746.

[6] Song, W. et al. 2019. Partial Attribute-Driven Video Person Re-Identification. 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) (Nov. 2019), 539–546.

[7] Wieczorek, M. et al. 2020. A Strong Baseline for Fashion Retrieval with Person Re-Identification Models. arXiv:2003.04094 [cs]. (Mar. 2020).

[8] Wu, L. et al. 2019. A Neural Influence Diffusion Model for Social Recommendation. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France, Jul. 2019), 235–244.

[9] Zhang, X. et al. 2019. TVV: Real-Time Visual Identity and Tracking with Edge Computing. Proceedings of the 2019 International Conference on Embedded Wireless Systems and Networks (Beijing, China, Mar. 2019), 419–424.

[10] Zhang, Z. et al. 2018. Billion-Scale Network Embedding with Iterative Random Projection. 2018 IEEE International Conference on Data Mining (ICDM) (Nov. 2018), 787–796.

Jan Kaczmarek

President at MOST Foundation. Co-Founder at exeq.eu and clipatize.com ** I help high-tech companies grow and get funded.

Very interesting insights, thank you. What is the purpose of AI understanding fashion?

From mass surveillance to fashion advice - can consumer AI benefit from surveillance research?

Jaroslaw Krolewski

synerise.com | basemodel.ai | cleora.ai | wislakrakow.com | agh.edu.pl

More articles by this author

Insights from the community

Others also viewed

The Role Played By AI to Track Ongoing Trends in Fashion

Putting Your Foot In It – The Evolving Relationship Between Humans and Technology

This Week in Fashion AI - Week 17

Will AI Replace Human Models in the Fashion Industry?

Herringbone Patterns with AI Magic

Herringbone Patterns with AI Magic

This Week Fashion AI - Week 14

The Future of Fashion with AI: A Glimpse into Karusel's Visionary Design Process

How is Artificial Intelligence (AI) being used to predict fashion trends?

What is the Role of AI in the Fashion Sphere?

Explore topics

Synerise Monad: Apply science to behavioral data. Automatically.

Jul 6, 2022

How Synerise AI Team challenge the Transformer.

May 31, 2022

Synerise Cleora sets new standards in identifying substitutes and complementary products.

Apr 8, 2021

Cleora.ai - Swiss Army knife - essential element of systems operating on data in the form of a network of connected nodes.

Mar 2, 2021

AI for good: Cleora.AI created by Synerise in Biomedical Sciences.

Feb 22, 2021

Deconstruction of fake #AI Benchmarks - Recommender Systems Case Study

Feb 20, 2021

Synerise open-sourcing Cleora AI framework for ultra-fast embeddings in large graphs

Nov 6, 2020

Synerise Terrarium - a massive scale in-memory & disk storage built from scratch

Sep 17, 2020

Synerise business continuity during COVID-19: a message for our people, clients, partners and suppliers

Mar 19, 2020

How Synerise collaborates with Microsoft to stop the guessing game in retail

Nov 5, 2019

Insights from the community

Others also viewed

The Role Played By AI to Track Ongoing Trends in Fashion

Putting Your Foot In It – The Evolving Relationship Between Humans and Technology

This Week in Fashion AI - Week 17

Will AI Replace Human Models in the Fashion Industry?

Herringbone Patterns with AI Magic

Herringbone Patterns with AI Magic

This Week Fashion AI - Week 14

The Future of Fashion with AI: A Glimpse into Karusel's Visionary Design Process

How is Artificial Intelligence (AI) being used to predict fashion trends?

What is the Role of AI in the Fashion Sphere?

Explore topics