System Design Architecture StackOverflow

Shivam Sinha
2 min readFeb 21, 2022

Stack Overflow is a question-and-answer website for professional and enthusiast programmers. In the previous post, we discussed LLD, in this post we are going to discuss the HLD of StackOverflow.

Functional Requirements

  1. Post questions, answers, upvotes, and downvotes.
  2. Follow Users.
  3. Home Timeline: To show relevant questions to users.
  4. Search Questions.

Non Functional Requirements

  1. High Availability
  2. Low Latency (Real Time)

Estimations:

  1. Total Users: 20M
  2. Total Questions: 25M
  3. Toal Answers: 40M
  4. Total Votes: 200M

Questions/day: 5K

Answers/day: 7K

Votes/day : 36K

Active Users/day: 5000

Storage Estimations:

Size of each question and answer: 30KB

Votes: 20 Bytes

UserMetaData(like views, time spent, etc.): 10KB

Storage/day: 30KB*(5000+7000) + 20Bytes*(36000) + 10KB*(5000) = 410MB

Database:

Database Design

Services:

  1. User App/ Web App: Used to Add/Delete/Modify Questions and answers. Voting questions and answers.
  2. Question Service: A list of API Servers handles queries related to questions like Add/Modify questions.
  3. Answer Service.
  4. Vote Service: For upvote and downvote.
  5. User Activity Service: Use Pub-Sub model like Kafka and store them in HDFS data store.
  6. LOG Service: Subscribe to user activities topics and publish messages for notification service also storing all logs in HDFS data store that can be used later for recommending questions to different users, analytics, etc.
  7. Load Balancer: Behind different API servers that handle different services (Question Service, Answer Service).
  8. API Gateway: That forwards traffic to different Load balancers (Path-Based API Gateway).
  9. Home Timeline Service.
  10. Cache Populator Service: Cron which populate the cache with say top 200 questions per user. Also, we can run this service on demand to fetch more questions say when a user viewed 100 out of 200 questions. It may use some machine learning algorithms to fetch questions. This also uses logs saved by the User Activity Service to populate questions in our cache using some machine learning algorithms or any other recommendation algorithms.
  11. Search Service (Lets’s discuss this later). For now, I am considering that there is already a search service.
  12. Notification Service. (Real-Time)

High-Level Architecture:

HLD StackOverflow

ThankYou. Please do give suggestions.

--

--