Creating a technology platform from scratch presents a substantial challenge for solution architects, differing significantly from merely enhancing existing systems. Each decision profoundly influences not just the infrastructure, but also the software components, coding style, data structures, and the overall functionality of the application. This process demands careful planning of both the platform's construction and its operation. Functional and non-functional business requirements directly impact choices regarding the technology stack, the environment, and other key factors. In this post, I have outlined ten critical decisions that must be made when designing the combination of software and infrastructure architecture and their impact on solution design hopefully aiding in the decision-making process.
Note: This checklist presents a range of examples and product names to illustrate the various aspects of building technology platforms from scratch. The lists provided are neither complete nor exhaustive. They are intended to serve as relatable references, allowing readers to connect the content with familiar concepts and tools.
Ten Essential Choices for Building Platforms from Scratch
1. Technology Stack Essentials: Choosing the right set of technologies (programming languages, frameworks, databases, etc.) is crucial. This decision impacts not just the initial development but also long-term maintenance and scalability.
- Programming Languages: Java, C#, Python, JavaScript, Ruby, etc.
- Frameworks: React, Angular, Django, .NET, Spring Boot, etc.
- Databases: MySQL, PostgreSQL, MongoDB, Cassandra, Oracle, etc.
- Web Servers: Apache, Nginx, IIS, Azure Web Apps etc.
- DevOps Tools: Docker, Kubernetes, Jenkins, Github, GitLab CI/CD, etc.
- Front-End Technologies: HTML, CSS, Vue.js, Svelte, etc.
- Mobile Development Frameworks: React Native, Flutter, Xamarin, etc.
- API Technologies: REST, GraphQL, gRPC, etc.
- Monitoring and Logging Tools: Prometheus, Grafana, ELK Stack, Splunk, Azure Monitor / Insight, etc.
- Security Tools: OWASP ZAP, Fortify, SonarQube, etc.
- Infrastructure as Code (IaC): Terraform, Ansible, Azure Resource Management (ARM) and Bicep, AWS CloudFormation, etc.
- Project Management: Office tools, JIRA etc.
- Project Requirements - The primary goal of any technology stack is to meet the project's functional and non-functional requirements. For instance, if the project requires heavy data processing, a technology known for its performance (like C# or Java) might be preferable. Similarly, for a web application, JavaScript frameworks like React or Angular are essential (often multiple technologies are needed in different layers, depending on the platform).
- Team Expertise, Community Support - The skill set of the development team greatly influences the choice of technology. Using a technology stack that the team is familiar with can accelerate development and reduce the learning curve. However, it's also important to balance team expertise with the project needs.
- Long-Term Sustainability - Choosing technologies that are likely to be supported and updated in the long term is crucial for the maintainability and scalability of the project. This includes not only the core technology but also libraries and tools.
Metrics: Development speed, performance benchmarks, scalability potential, and compatibility with existing systems.
2. Architecture Design: Deciding on the architecture (monolithic, microservices, serverless, etc.) that best suits the project's needs. This includes considerations for scalability, maintainability, and the ability to integrate with other systems.
- Monolithic Architecture: A single-tiered software application.
- Client-Server Architecture: A model that separates the client (user interface) and server (data storage and processing).
- N-Tier Architecture: A multi-tiered software architecture that separates the presentation, logic, and data layers.
- Layered Architecture: Similar to n-tier, but with more emphasis on the separation of concerns in each layer.
- Peer-to-Peer (P2P) Architecture: A decentralized architecture where each node or peer shares resources and services with other peers directly.
- Service-Oriented Architecture (SOA): An approach where application components provide services to other components via a communication protocol.
- Microservices Architecture: An approach where a single application is composed of many loosely coupled and independently deployable smaller services.
- Serverless Architecture: Cloud-provider managed service executions.
- Event-Driven Architecture: Systems that react to events.
- Scalability - The ability of an architecture to scale up or down based on the application's demand is critical in a dynamic environment. This includes handling increased user load, data volume, and transaction frequency. Microservices architecture, for example, is highly scalable as each service can be scaled independently based on demand, unlike monolithic architecture where the entire application has to be scaled, but it has its drawbacks when it comes to transactions management (atomicity) and batch processing.
- Maintainability - The ease with which an application can be updated, tested, and managed over its lifecycle is vital. A maintainable architecture can adapt to new requirements and technologies without significant rework. For example, Layered architecture, with its separation of concerns, simplifies updates and maintenance. In contrast, monolithic architecture can become cumbersome to maintain as the application grows.
- Development Complexity - The complexity of development and management within a project's architecture needs to be in harmony with the team's expertise and the project's timeline. Architectures that are too complex may cause delays and escalate costs. However, if the team possesses the necessary expertise, a complex architecture can paradoxically simplify development. This is achieved by narrowing the developers' focus to specific application logic, streamlining their tasks and potentially improving efficiency.
- Deployment Strategies - The choice of architecture influences how the application will be deployed, updated, and managed in production. This affects the speed of deployment, downtime, and resource utilization. For example, Event-driven and serverless architectures can offer more dynamic and flexible deployment strategies, as they can respond in real-time to various events. In contrast, traditional client-server or monolithic architectures might have more straightforward but less flexible deployment strategies.
Metrics: Response time, system throughput, resource utilization, and ease of updates or rollbacks.
3. Cloud vs On-Premises: Determining whether to use cloud services or to host the platform on-premises. This decision involves cost, scalability, security, and compliance considerations.
- Cloud-Based Solutions: AWS, Azure, Google Cloud Platform, etc.
- On-Premises Solutions: Traditional data centre hosting or private clouds.
- Hybrid Solutions: A mix of cloud and on-premises infrastructure.
- Cost - In choosing between cloud and on-premises infrastructure, cost is a major factor, encompassing initial investment, ongoing operational expenses, and potential savings. Cloud solutions typically offer a flexible, pay-as-you-go model that minimizes upfront costs, whereas on-premises solutions demand a higher initial investment but can be more economical for large-scale operations in the long run. Cloud environments also provide various deployment options that can help in further cost reduction. In hybrid models, which combine cloud and on-premises elements, a thorough cost-benefit analysis is necessary, factoring in traffic/bandwidth costs and data management across various environments. Crucially, it's important to select not just the type of environment but also the right mix of services and licensing types. Different licenses offer varying levels of security, compliance, performance, and features, and choosing appropriately is essential to optimize costs while meeting specific feature and requirement needs. This decision-making process is complex and depends on factors like the application's nature, the locations of services, and the types of data involved.
- Scalability - Cloud-based solutions like AWS, Azure, or Google Cloud Platform offer almost unlimited scalability, allowing businesses to quickly adjust resources in response to varying demands. On-premises solutions, however, may be limited by the physical capacity of the data centre and the speed of acquiring additional resources.
- Control Over Infrastructure - Some organizations require or prefer having complete control over their IT infrastructure for various reasons, including security, customization, or specific performance requirements. On-premises solutions provide full control over the hardware and software environment, which is essential for certain regulatory or complex environments. Cloud solutions, while offering less physical control, provide extensive management tools and integrations.
- Compliance Requirements - Adherence to industry regulations and standards can dictate where and how data is stored and processed. Organizations with strict data residency or privacy requirements may opt for on-premises solutions to ensure compliance. Cloud providers, however, are increasingly offering region-specific services and certifications to meet various compliance demands.
Metrics: Total cost of ownership (TCO), latency, uptime/availability, and resource scalability.
4. Data Storage and Management: Choosing the right data storage solutions (relational databases, NoSQL databases, data lakes, etc.) and establishing data management practices, including data security and privacy.
- Relational Databases (SQL): MySQL, PostgreSQL, Microsoft SQL Server, etc.
- NoSQL Databases: MongoDB, Cassandra, Redis, etc.
- Data Lakes: Apache Hadoop, Azure Data Lake, etc.
- Data Warehousing: Amazon Redshift, Google BigQuery, Snowflake, etc.
- Cloud Object Storage: Amazon S3, Azure Blob Storage, Google Cloud Storage, etc.
- File Storage: NFS, SMB/CIFS, Amazon EFS, etc.
- Block Storage: Amazon EBS, Azure Disk Storage, etc.
- Data type - The nature of the data being stored – whether it's structured, semi-structured, or unstructured – dictates the most suitable type of data storage solution. Relational databases like MySQL or PostgreSQL are ideal for structured data with clear relationships, while NoSQL databases like MongoDB or Cassandra are better suited for unstructured or semi-structured data. Block Storage is an excellent choice for binary data and big files.
- Volume - The amount of data to be stored and managed significantly impacts the choice of storage solution. High-volume data storage demands solutions that can handle scalability and performance efficiently. Data lakes and cloud object storage solutions like Amazon S3 are designed to handle vast amounts of data, making them suitable for big data applications.
- Velocity - The rate at which data is generated, processed, and accessed, is vital, particularly for applications that demand real-time processing. For handling high-velocity data, such as in streaming or online transaction processing systems, solutions like fast-processing databases (e.g., Redis) or real-time data warehousing (e.g., Amazon Redshift) are essential. However, it's important to note that increased data availability often incurs higher costs and should be utilized judiciously.
- Security - It is a critical aspect, where encrypted databases and secure cloud storage offer strong protection. For extremely sensitive data, on-premises solutions may be preferable to maintain full control. However, employing encryption involves complexities such as cryptographic key management and adherence to best practices. Therefore, the decision to use encryption, depending on specific needs, can be more challenging than it initially appears.
- Considerations Related to Application Design and Business Logic - The design of business logic directly affects data organization and management. Code requirements vary, with some sections needing fast access to certain data elements and others managing well with slower access. Additionally, the capacity to handle encrypted data differs across code sections. This relationship between code and data is critical, especially in building a platform from scratch. It's essential to align code writing with data organization, accessibility, and security needs, ensuring an efficient and secure system that meets the specific requirements of both the business logic and data management.
Metrics: Query response time, storage capacity, data integrity, ease of use and compliance adherence.
5. Security Strategy: Developing a comprehensive security strategy that includes data encryption, secure access controls, and protection against cyber threats.
- Data Encryption: TLS/SSL for data in transit, AES for data at rest.
- Access Control: OAuth, OpenID Connect, SAML, etc.
- Security Tools: Firewalls, Anti-Virus, Intrusion Detection Systems (IDS), etc.
- Cryptographic Models (Hardware/Software, FIPS Compliance)
- Secret Management: Tools and practices for managing sensitive information like API keys, data keys, credentials, and certificates throughout their entire life cycle.
- Identity and Access Management (IAM): Microsoft Active Directory, AWS IAM, Okta, Google Cloud Identity, etc.
- Security Information and Event Management (SIEM)
- Environments: Production, UAT, Test, Development, etc.
- Data Segmentation: Cryptographic key management, Key Rotation, etc.
- Disaster Recovery: backup, restore and automation.
- Risk vs. Cost - Balancing the level of security with the associated costs is crucial. Over-investing in security measures for low-risk areas can be uneconomical, while under-investing in high-risk areas can lead to significant vulnerabilities.
- Insurability - The ability to insure against cyber threats can influence the security measures adopted. Insurance providers often have specific requirements for security practices to provide coverage. Maintaining certain standards in data encryption and access control can be a prerequisite for obtaining cyber insurance, thereby influencing the security strategy.
- Threat Landscape - Understanding the current and evolving cyber threat landscape for the specific business helps in designing a security strategy that effectively counters specific threats faced by the organization. For example, an organization facing threats from sophisticated phishing attacks would prioritize advanced email filtering and user training, while one threatened by DDoS attacks would focus on robust network defences.
- Regulatory Requirements - Compliance with internal standards, as well as legal and regulatory requirements is mandatory. This includes data protection laws, industry standards, and other regulations. For example, organizations handling sensitive personal data must comply with GDPR, which would require specific measures in data encryption, access control, and data processing.
- Data Sensitivity - The required security level should match the sensitivity of the data being safeguarded. More sensitive data demands stricter security protocols. Highly sensitive information, like financial records or personal identifiers, needs robust encryption, stringent access controls, and regular security audits. This impacts the selection of technology stack, data organization, and development priorities.
Metrics: Time to detect and respond to incidents, compliance audit results, and vulnerability assessment outcomes.
6. Compliance and Regulations: Ensuring the platform complies with relevant laws and regulations, especially if it involves sensitive data or operates in highly regulated industries.
- Data Protection: GDPR, HIPAA, CCPA, etc.
- Industry-Specific Standards: PCI DSS for payment processing, ISO/IEC standards, etc.
- Auditing and Reporting Tools: For compliance monitoring and reporting.
- Industry-Specific Regulations - Different industries are subject to specific regulations that govern how certain types of data are handled and protected. Compliance with these regulations is mandatory to avoid legal repercussions and maintain industry standards.
- Data Privacy Laws - Data privacy laws like GDPR, HIPAA, and CCPA set standards for how personal and sensitive data should be collected, stored, processed, and shared. Compliance with these laws is essential to protect user privacy and avoid hefty fines.
- Geographic Considerations - The geographic location of a platform's operations and its user base plays a critical role in determining the legal and regulatory frameworks that apply. Different countries and regions have unique legal and regulatory requirements. Additionally, there may be a need to store user data in specific geographic regions, which influences the choice of cloud service environments.
Metrics: Compliance audit results, number of compliance incidents, and cost of compliance management.
7. Integration with Existing Systems: Planning for integration with existing systems and infrastructure, including legacy systems, which can be critical for data flow and operational continuity.
- API Integration: REST, GraphQL, SOAP, etc.
- Middleware: ESB (Enterprise Service Bus), Apache Kafka, RabbitMQ, etc.
- Data Integration Tools: ETL tools, data synchronization, etc.
- Cloud Integration Services: Azure Logic Apps, Azure API Management, etc.
- Integration Platform as a Service: Zapier, etc.
- Hybrid Integration Platforms: TIBCO Cloud Integration, IBM Integration Services, etc.
- Compatibility with legacy systems - Ensuring new systems can communicate and work effectively with older, existing systems is crucial to maintain operational continuity and data integrity. For example, when integrating with legacy systems that may not support modern protocols, middleware solutions like an Enterprise Service Bus (ESB) can be used to bridge the gap, allowing for smooth data flow between old and new systems.
- Existing Infrastructure - The chosen integration solution needs to be compatible with the existing technological infrastructure's capabilities and constraints. In environments predominantly utilizing cloud-based services, leveraging cloud integration services or Integration Platform as a Service (iPaaS) solutions is often more efficient and cost-effective than developing on-premises solutions. However, this approach might be constrained by security and compliance considerations, potentially requiring supplementary technologies or innovative integration methods.
- Business processes - The integration strategy should support and enhance the organization's business processes. The goal is to improve efficiency and workflow without causing disruptions. If the business relies heavily on real-time data processing, using tools like Apache Kafka for stream processing in integration can be crucial. For batch processing of data, traditional ETL tools might be more appropriate.
Metrics: Integration time, data consistency across systems, and impact on existing system performance.
8. User Experience (UX) and Accessibility: Ensuring the platform is user-friendly and accessible to all users, including those with disabilities, which can significantly impact adoption and satisfaction.
- Front-End Frameworks: React, Angular, Vue.js, etc.
- Accessibility Standards: WCAG, ADA compliance, etc.
- UX Design Tools: Sketch, Adobe XD, Figma, etc.
- CRM Tools: Dynamics 365, etc.
- Target Audience - Understanding the needs, preferences, and behaviors of the target audience is crucial in designing a user-friendly interface. Different audiences may have different expectations and ways of interacting with the platform. Tech-savvy young adults might focus on a modern, minimalistic design using advanced front-end frameworks, while a platform for older adults might prioritize simplicity and clarity in navigation, influencing the type of UI technology being used.
- Accessibility Requirements - Compliance with standards like WCAG and ADA involves designing interfaces that are usable by people with various disabilities, which might include features like screen reader compatibility, keyboard navigation, and contrast adjustment options.
- Platform/Device Compatibility - Employing responsive design frameworks like Bootstrap or Flexbox in CSS is key for ensuring that the platform's UI adapts effectively to various screen sizes, from desktop monitors to mobile phones. However, not all applications or features within an application need this adaptability. Certain use cases may involve complex data unsuitable for a mobile interface, or vice versa. Therefore, these specific requirements will influence the choice of technologies used to develop different features of the platform.
Metrics: User satisfaction scores, time on task, accessibility audit results, and user engagement metrics.
9. Scalability and Performance Optimization: Planning for future growth in terms of user base and data volume, and ensuring the platform can scale efficiently without compromising performance.
- Load Balancing: Hardware load balancers, Nginx, HAProxy, etc.
- Caching: Redis, Memcached, Varnish, etc.
- Content Delivery Networks (CDNs): Akamai, Cloudflare, AWS CloudFront, etc.
- Anticipated Growth in User Base and Data Volume - Planning for future growth is crucial to ensure that the system can handle increasing users and data without performance issues. However, investing in processing at scale in the early stages can be challenging or expensive, especially for companies with limited budgets or expertise. Organizations must forecast the timeline for needing these technologies and decide whether to invest immediately or later. Personally, if the need is expected within two years, I would recommend investing sooner rather than later.
- Performance Requirements - Maintaining a high level of performance is essential for user satisfaction and engagement. Performance optimization is about ensuring that the system remains responsive and efficient under varying loads. This might require additional investment in monitoring tools.
- Infrastructure Elasticity - The ability of the infrastructure to dynamically scale resources up or down based on real-time demand is a key aspect of effective scalability. This ensures that the system can handle peak loads efficiently while optimizing resource usage during off-peak times. The organization might want to utilize cloud-based services like AWS CloudFront or other Content Delivery Networks (CDNs) which can provide the necessary elasticity.
Metrics: Load handling capacity, response times under varying loads, and resource utilization efficiency.
10. Vendor and Technology Partnerships: Deciding on vendors and technology partners, including open source vs. proprietary solutions, which can affect the cost, support, and evolution of the platform.
- Open Source Solutions: Linux, Apache, MySQL/MariaDB, Python, etc.
- Proprietary Solutions: Microsoft, Oracle, IBM, etc.
- Partnerships and Alliances: Technology partner programs, joint ventures, etc.
- Vendor Reputation - The reputation of a vendor or technology partner is key in assessing their reliability, service quality, and product performance. Reputable vendors are typically more reliable in providing stable and effective solutions. However, they may not always offer the most cutting-edge technology, which might be available from smaller, innovative startups. Balancing between actual needs and perceived needs is crucial in this context.
- Support and Maintenance Services - Proprietary solutions often come with comprehensive support and maintenance packages, ensuring prompt assistance and regular updates. Open source solutions might rely more on community support, which can vary in responsiveness and expertise.
- Alignment with Project Goals - An open-source solution like Linux or Apache may be more suitable for a project prioritizing flexibility and customization, while proprietary solutions may be preferred for projects requiring specific, high-end features and dedicated support.
Metrics: Vendor response time, cost-effectiveness, service level agreements (SLAs), and historical uptime/reliability.
Key takeaways
There is remarkable complexity and connectedness of choices in building technology platforms from scratch. To put it succinctly:
1. Emphasizing a Holistic Approach: Every decision, from the technology stack to architectural design, is intricately linked. Choices in one area significantly affect others. For example, selecting a programming language influences not only development but also aspects like maintenance, scalability, and team recruitment.
2. Strategic Planning is Crucial: It's essential to plan with a vision that's both forward-looking and aligned with business evolution. Balancing innovation with practicality, these plans should also cater to specific budgets, business needs, industry standards, and regulatory requirements.
3. Recognizing the Role of Non-Technical Factors: Factors such as team expertise and vendor relationships are integral to architectural decisions. The most theoretically optimal design may still falter if it doesn't align with the capabilities and context of the team and partners involved.
Technology Leader | CTO | Co-Founder | Ex PokerStars
1yAs always, very insightful.