Longest Common Subsequence (LCS)
Last Updated :
02 Dec, 2024
Given two strings, s1 and s2, the task is to find the length of the Longest Common Subsequence. If there is no common subsequence, return 0
.
A subsequence is a string generated from the original string by deleting 0 or more characters and without changing the relative order of the remaining characters. For example , subsequences of “ABC” are “”, “A”, “B”, “C”, “AB”, “AC”, “BC” and “ABC”.
In general a string of length n has 2n subsequences.
Examples:
Input: s1 = “ABC”, s2 = “ACD”
Output: 2
Explanation: The longest subsequence which is present in both strings is “AC”.
Input: s1 = “AGGTAB”, s2 = “GXTXAYB”
Output: 4
Explanation: The longest common subsequence is “GTAB”.
Input: s1 = “ABC”, s2 = “CBA”
Output: 1
Explanation: There are three longest common subsequences of length 1, “A”, “B” and “C”.
[Naive Approach] Using Recursion – O(2 ^ min(m, n)) Time and O(min(m, n)) Space
The idea is to compare the last characters of s1 and s2. While comparing the strings s1 and s2 two cases arise:
- Match : Make the recursion call for the remaining strings (strings of lengths m-1 and n-1) and add 1 to result.
- Do not Match : Make two recursive calls. First for lengths m-1 and n, and second for m and n-1. Take the maximum of two results.
Base case : If any of the strings become empty, we return 0.
For example, consider the input strings s1 = “ABX” and s2 = “ACX”.
LCS(“ABX”, “ACX”) = 1 + LCS(“AB”, “AC”) [Last Characters Match]
LCS(“AB”, “AC”) = max( LCS(“A”, “AC”) , LCS(“AB”, “A”) ) [Last Characters Do Not Match]
LCS(“A”, “AC”) = max( LCS(“”, “AC”) , LCS(“A”, “A”) ) = max(0, 1 + LCS(“”, “”)) = 1
LCS(“AB”, “A”) = max( LCS(“A”, “A”) , LCS(“AB”, “”) ) = max( 1 + LCS(“”, “”, 0)) = 1
So overall result is 1 + 1 = 2
Below is the implementation of the recursive approach:
C++
// A Naive recursive implementation of LCS problem
#include <iostream>
using namespace std;
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
int lcs(string &s1, string &s2, int m, int n) {
// Base case: If either string is empty, the length of LCS is 0
if (m == 0 || n == 0)
return 0;
// If the last characters of both substrings match
if (s1[m - 1] == s2[n - 1])
// Include this character in LCS and recur for remaining substrings
return 1 + lcs(s1, s2, m - 1, n - 1);
else
// If the last characters do not match
// Recur for two cases:
// 1. Exclude the last character of s1
// 2. Exclude the last character of s2
// Take the maximum of these two recursive calls
return max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
}
int main() {
string s1 = "AGGTAB";
string s2 = "GXTXAYB";
int m = s1.size();
int n = s2.size();
cout << lcs(s1, s2, m, n) << endl;
return 0;
}
C
// A Naive recursive implementation of LCS problem
#include <stdio.h>
#include <string.h>
int max(int x, int y) {
return x > y ? x : y;
}
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
int lcs(char *s1, char *s2, int m, int n) {
// Base case: If either string is empty, the length of LCS is 0
if (m == 0 || n == 0)
return 0;
// If the last characters of both substrings match
if (s1[m - 1] == s2[n - 1])
// Include this character in LCS and recur for remaining substrings
return 1 + lcs(s1, s2, m - 1, n - 1);
else
// If the last characters do not match
// Recur for two cases:
// 1. Exclude the last character of S1
// 2. Exclude the last character of S2
// Take the maximum of these two recursive calls
return max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
}
int main() {
char s1[] = "AGGTAB";
char s2[] = "GXTXAYB";
int m = strlen(s1);
int n = strlen(s2);
printf("%d\n", lcs(s1, s2, m, n));
return 0;
}
Java
// A Naive recursive implementation of LCS problem
class GfG {
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
static int lcs(String s1, String s2, int m, int n) {
// Base case: If either string is empty, the length of LCS is 0
if (m == 0 || n == 0)
return 0;
// If the last characters of both substrings match
if (s1.charAt(m - 1) == s2.charAt(n - 1))
// Include this character in LCS and recur for remaining substrings
return 1 + lcs(s1, s2, m - 1, n - 1);
else
// If the last characters do not match
// Recur for two cases:
// 1. Exclude the last character of S1
// 2. Exclude the last character of S2
// Take the maximum of these two recursive calls
return Math.max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
}
public static void main(String[] args) {
String s1 = "AGGTAB";
String s2 = "GXTXAYB";
int m = s1.length();
int n = s2.length();
System.out.println(lcs(s1, s2, m, n));
}
}
Python
# A Naive recursive implementation of LCS problem
# Returns length of LCS for s1[0..m-1], s2[0..n-1]
def lcs(s1, s2, m, n):
# Base case: If either string is empty, the length of LCS is 0
if m == 0 or n == 0:
return 0
# If the last characters of both substrings match
if s1[m - 1] == s2[n - 1]:
# Include this character in LCS and recur for remaining substrings
return 1 + lcs(s1, s2, m - 1, n - 1)
else:
# If the last characters do not match
# Recur for two cases:
# 1. Exclude the last character of S1
# 2. Exclude the last character of S2
# Take the maximum of these two recursive calls
return max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n))
if __name__ == "__main__":
s1 = "AGGTAB"
s2 = "GXTXAYB"
m = len(s1)
n = len(s2)
print(lcs(s1, s2, m, n))
C#
// A Naive recursive implementation of LCS problem
using System;
class GfG {
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
static int lcs(string s1, string s2, int m, int n) {
// Base case: If either string is empty, the length of LCS is 0
if (m == 0 || n == 0)
return 0;
// If the last characters of both substrings match
if (s1[m - 1] == s2[n - 1])
// Include this character in LCS and recur for remaining substrings
return 1 + lcs(s1, s2, m - 1, n - 1);
else
// If the last characters do not match
// Recur for two cases:
// 1. Exclude the last character of S1
// 2. Exclude the last character of S2
// Take the maximum of these two recursive calls
return Math.Max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
}
static void Main() {
string s1 = "AGGTAB";
string s2 = "GXTXAYB";
int m = s1.Length;
int n = s2.Length;
Console.WriteLine(lcs(s1, s2, m, n));
}
}
JavaScript
// A Naive recursive implementation of LCS problem
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
function lcs(s1, s2, m, n) {
// Base case: If either string is empty, the length of LCS is 0
if (m === 0 || n === 0)
return 0;
// If the last characters of both substrings match
if (s1[m - 1] === s2[n - 1])
// Include this character in LCS and recur for remaining substrings
return 1 + lcs(s1, s2, m - 1, n - 1);
else
// If the last characters do not match
// Recur for two cases:
// 1. Exclude the last character of S1
// 2. Exclude the last character of S2
// Take the maximum of these two recursive calls
return Math.max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
}
// driver code
let s1 = "AGGTAB";
let s2 = "GXTXAYB";
let m = s1.length;
let n = s2.length;
console.log(lcs(s1, s2, m, n));
Time Complexity: O(2min(m, n)) , where m and n are lengths of strings s1 and s2.
Auxiliary Space: O(min(m, n)) , recursion stack space
[Better Approach] Using Memoization – O(m * n) Time and O(m * n) Space
If we use the above recursive approach for strings “AXYT” and “AYZX“, we will get a partial recursion tree as shown below. Here we can see that the subproblem L(“AXY”, “AYZ”) is being calculated more than once. If the total tree is considered there will be several such overlapping subproblems. Hence we can optimize it either using memoization or tabulation.
Overlapping Subproblems in Longest Common Subsequence
- There are two parameters that change in the recursive solution and these parameters go from 0 to m and 0 to n. So we create a 2D array of size (m+1) x (n+1).
- We initialize this array as -1 to indicate nothing is computed initially.
- Now we modify our recursive solution to first do a lookup in this table and if the value is -1, then only make recursive calls. This way we avoid re-computations of the same subproblems.
Below is the implementation of the above approach:
C++
// C++ implementation of Top-Down DP
// of LCS problem
#include <iostream>
#include <vector>
using namespace std;
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
int lcs(string &s1, string &s2, int m, int n, vector<vector<int>> &memo) {
// Base Case
if (m == 0 || n == 0)
return 0;
// Already exists in the memo table
if (memo[m][n] != -1)
return memo[m][n];
// Match
if (s1[m - 1] == s2[n - 1])
return memo[m][n] = 1 + lcs(s1, s2, m - 1, n - 1, memo);
// Do not match
return memo[m][n] = max(lcs(s1, s2, m, n - 1, memo), lcs(s1, s2, m - 1, n, memo));
}
int main() {
string s1 = "AGGTAB";
string s2 = "GXTXAYB";
int m = s1.length();
int n = s2.length();
vector<vector<int>> memo(m + 1, vector<int>(n + 1, -1));
cout << lcs(s1, s2, m, n, memo) << endl;
return 0;
}
C
// C implementation of Top-Down DP
// of LCS problem
#include <stdio.h>
#include <string.h>
// Define a maximum size for the strings
#define MAX 1000
// Function to find the maximum of two integers
int max(int a, int b) {
return (a > b) ? a : b;
}
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
int lcs(const char *s1, const char *s2, int m, int n, int memo[MAX][MAX]) {
// Base Case
if (m == 0 || n == 0) {
return 0;
}
// Already exists in the memo table
if (memo[m][n] != -1) {
return memo[m][n];
}
// Match
if (s1[m - 1] == s2[n - 1]) {
return memo[m][n] = 1 + lcs(s1, s2, m - 1, n - 1, memo);
}
// Do not match
return memo[m][n] = max(lcs(s1, s2, m, n - 1, memo), lcs(s1, s2, m - 1, n, memo));
}
int main() {
const char *s1 = "AGGTAB";
const char *s2 = "GXTXAYB";
int m = strlen(s1);
int n = strlen(s2);
// Create memo table with fixed size
int memo[MAX][MAX];
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
// Initialize memo table with -1
memo[i][j] = -1;
}
}
printf("%d\n", lcs(s1, s2, m, n, memo));
return 0;
}
Java
// Java implementation of Top-Down DP of LCS problem
import java.util.Arrays;
class GfG {
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
static int lcs(String s1, String s2, int m, int n,
int[][] memo) {
// Base Case
if (m == 0 || n == 0)
return 0;
// Already exists in the memo table
if (memo[m][n] != -1)
return memo[m][n];
// Match
if (s1.charAt(m - 1) == s2.charAt(n - 1)) {
return memo[m][n]
= 1 + lcs(s1, s2, m - 1, n - 1, memo);
}
// Do not match
return memo[m][n]
= Math.max(lcs(s1, s2, m, n - 1, memo),
lcs(s1, s2, m - 1, n, memo));
}
public static void main(String[] args) {
String s1 = "AGGTAB";
String s2 = "GXTXAYB";
int m = s1.length();
int n = s2.length();
int[][] memo = new int[m + 1][n + 1];
// Initialize the memo table with -1
for (int i = 0; i <= m; i++) {
Arrays.fill(memo[i], -1);
}
System.out.println(lcs(s1, s2, m, n, memo));
}
}
Python
# Python implementation of Top-Down DP of LCS problem
# Returns length of LCS for s1[0..m-1], s2[0..n-1]
def lcs(s1, s2, m, n, memo):
# Base Case
if m == 0 or n == 0:
return 0
# Already exists in the memo table
if memo[m][n] != -1:
return memo[m][n]
# Match
if s1[m - 1] == s2[n - 1]:
memo[m][n] = 1 + lcs(s1, s2, m - 1, n - 1, memo)
return memo[m][n]
# Do not match
memo[m][n] = max(lcs(s1, s2, m, n - 1, memo),
lcs(s1, s2, m - 1, n, memo))
return memo[m][n]
if __name__ == "__main__":
s1 = "AGGTAB"
s2 = "GXTXAYB"
m = len(s1)
n = len(s2)
memo = [[-1 for _ in range(n + 1)] for _ in range(m + 1)]
print(lcs(s1, s2, m, n, memo))
C#
// C# implementation of Top-Down DP of LCS problem
using System;
class GfG {
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
static int lcs(string s1, string s2, int m,
int n, int[, ] memo) {
// Base Case
if (m == 0 || n == 0)
return 0;
// Already exists in the memo table
if (memo[m, n] != -1)
return memo[m, n];
// Match
if (s1[m - 1] == s2[n - 1]) {
return memo[m, n]
= 1 + lcs(s1, s2, m - 1, n - 1, memo);
}
// Do not match
return memo[m, n]
= Math.Max(lcs(s1, s2, m, n - 1, memo),
lcs(s1, s2, m - 1, n, memo));
}
public static void Main() {
string s1 = "AGGTAB";
string s2 = "GXTXAYB";
int m = s1.Length;
int n = s2.Length;
int[, ] memo = new int[m + 1, n + 1];
// Initialize memo array with -1
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
memo[i, j] = -1;
}
}
Console.WriteLine(lcs(s1, s2, m, n, memo));
}
}
JavaScript
// A Top-Down DP implementation of LCS problem
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
function lcs(s1, s2, m, n, memo) {
// Base Case
if (m === 0 || n === 0)
return 0;
// Already exists in the memo table
if (memo[m][n] !== -1)
return memo[m][n];
// Match
if (s1[m - 1] === s2[n - 1]) {
memo[m][n] = 1 + lcs(s1, s2, m - 1, n - 1, memo);
return memo[m][n];
}
// Do not match
memo[m][n] = Math.max(lcs(s1, s2, m, n - 1, memo),
lcs(s1, s2, m - 1, n, memo));
return memo[m][n];
}
// driver code
const s1 = "AGGTAB";
const s2 = "GS1TS1AS2B";
const m = s1.length;
const n = s2.length;
const memo = Array.from({length : m + 1},
() => Array(n + 1).fill(-1));
console.log(lcs(s1, s2, m, n, memo));
Time Complexity: O(m * n) ,where m and n are lengths of strings s1 and s2.
Auxiliary Space: O(m * n)
[Expected Approach 1] Using Bottom-Up DP (Tabulation) – O(m * n) Time and O(m * n) Space
There are two parameters that change in the recursive solution and these parameters go from 0 to m and 0 to n. So we create a 2D dp array of size (m+1) x (n+1).
- We first fill the known entries when m is 0 or n is 0.
- Then we fill the remaining entries using the recursive formula.
Say the strings are S1 = “AXTY” and S2 = “AYZX”, Follow below :
Below is the implementation of the above approach:
C++
#include <iostream>
#include <vector>
using namespace std;
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
int lcs(string &s1, string &s2) {
int m = s1.size();
int n = s2.size();
// Initializing a matrix of size (m+1)*(n+1)
vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));
// Building dp[m+1][n+1] in bottom-up fashion
for (int i = 1; i <= m; ++i) {
for (int j = 1; j <= n; ++j) {
if (s1[i - 1] == s2[j - 1])
dp[i][j] = dp[i - 1][j - 1] + 1;
else
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
// dp[m][n] contains length of LCS for s1[0..m-1]
// and s2[0..n-1]
return dp[m][n];
}
int main() {
string s1 = "AGGTAB";
string s2 = "GXTXAYB";
cout << lcs(s1, s2) << endl;
return 0;
}
C
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int max(int x, int y);
// Function to find length of LCS for s1[0..m-1], s2[0..n-1]
int lcs(const char *S1, const char *S2) {
int m = strlen(S1);
int n = strlen(S2);
// Initializing a matrix of size (m+1)*(n+1)
int dp[m + 1][n + 1];
// Building dp[m+1][n+1] in bottom-up fashion
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0 || j == 0)
dp[i][j] = 0;
else if (S1[i - 1] == S2[j - 1])
dp[i][j] = dp[i - 1][j - 1] + 1;
else
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
return dp[m][n];
}
int max(int x, int y) {
return (x > y) ? x : y;
}
int main() {
const char *S1 = "AGGTAB";
const char *S2 = "GXTXAYB";
printf("Length of LCS is %d\n", lcs(S1, S2));
return 0;
}
Java
import java.util.Arrays;
class GfG {
// Returns length of LCS for s1[0..m-1], s2[0..n-1]
static int lcs(String S1, String S2) {
int m = S1.length();
int n = S2.length();
// Initializing a matrix of size (m+1)*(n+1)
int[][] dp = new int[m + 1][n + 1];
// Building dp[m+1][n+1] in bottom-up fashion
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (S1.charAt(i - 1) == S2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else {
dp[i][j] = Math.max(dp[i - 1][j],
dp[i][j - 1]);
}
}
}
// dp[m][n] contains length of LCS for S1[0..m-1]
// and S2[0..n-1]
return dp[m][n];
}
public static void main(String[] args)
{
String S1 = "AGGTAB";
String S2 = "GXTXAYB";
System.out.println("Length of LCS is "
+ lcs(S1, S2));
}
}
Python
def get_lcs_length(S1, S2):
m = len(S1)
n = len(S2)
# Initializing a matrix of size (m+1)*(n+1)
dp = [[0] * (n + 1) for x in range(m + 1)]
# Building dp[m+1][n+1] in bottom-up fashion
for i in range(1, m + 1):
for j in range(1, n + 1):
if S1[i - 1] == S2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j],
dp[i][j - 1])
# dp[m][n] contains length of LCS for S1[0..m-1]
# and S2[0..n-1]
return dp[m][n]
if __name__ == "__main__":
S1 = "AGGTAB"
S2 = "GXTXAYB"
print("Length of LCS is", get_lcs_length(S1, S2))
C#
using System;
class Gfg {
// Returns length of LCS for S1[0..m-1], S2[0..n-1]
static int GetLCSLength(string S1, string S2) {
int m = S1.Length;
int n = S2.Length;
// Initializing a matrix of size (m+1)*(n+1)
int[, ] dp = new int[m + 1, n + 1];
// Building dp[m+1][n+1] in bottom-up fashion
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (S1[i - 1] == S2[j - 1]) {
dp[i, j] = dp[i - 1, j - 1] + 1;
}
else {
dp[i, j] = Math.Max(dp[i - 1, j],
dp[i, j - 1]);
}
}
}
// dp[m, n] contains length of LCS for S1[0..m-1]
// and S2[0..n-1]
return dp[m, n];
}
static void Main() {
string S1 = "AGGTAB";
string S2 = "GXTXAYB";
Console.WriteLine("Length of LCS is "
+ GetLCSLength(S1, S2));
}
}
JavaScript
function getLcsLength(S1, S2) {
const m = S1.length;
const n = S2.length;
// Initializing a matrix of size (m+1)*(n+1)
const dp = Array.from({length : m + 1},
() => Array(n + 1).fill(0));
// Building dp[m+1][n+1] in bottom-up fashion
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
if (S1[i - 1] === S2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
}
else {
dp[i][j]
= Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// dp[m][n] contains length of LCS for
// S1[0..m-1] and S2[0..n-1]
return dp[m][n];
}
const S1 = "AGGTAB";
const S2 = "GXTXAYB";
console.log("Length of LCS is", getLcsLength(S1, S2));
Time Complexity: O(m * n) which is much better than the worst-case time complexity of Naive Recursive implementation.
Auxiliary Space: O(m * n) because the algorithm uses an array of size (m+1)*(n+1) to store the length of the common subsequence.
[Expected Approach 2] Using Bottom-Up DP (Space-Optimization):
One important observation in the above simple implementation is, in each iteration of the outer loop we only need values from all columns of the previous row. So there is no need to store all rows in our DP matrix, we can just store two rows at a time and use them. We can further optimize to use only one array.
Please refer this post: A Space Optimized Solution of LCS
Applications of LCS
LCS is used to implement diff utility (find the difference between two data sources). It is also widely used by revision control systems such as Git for multiple changes made to a revision-controlled collection of files.
Problems based on LCS