(not a guide for this question; only for how this question is different from all others)
Main differences
There is a lot of overlap with Twitter design:
- Timeline design
- Social graph
- Search
But Instagram/Tiktok is image/video based, so:
- Bandwidth can be an issue in some localities, so on upload, producing videos at different bitrates is important. How about consistent loudness between videos?
- Access pattern (a lot in the first few days, then fade away) becomes important for cost saving. Need a “cached timelines” storage on Redis/Memcached, and a “archived timelines” on Cassandra/HBase.
- “The algorithm” is very important, so analytics feeding back to timelines MUST be mentioned.
“The algorithm”
- Users need to be profiled thoroughly to best recommend them posts that will boost engagement & retention.
- Their social graph & activity must be analyzed through collaborative filtering & other data science algorithms, and output some profiling artifact (e.g. tags) to be kept in some User Service’s cache.
- If elaborating on “The algorithm”: need
User Service
,User cache
,Trends service
,Profiling Service
, etc. - Then, use the feedback
Post-process service
in the diagram to explain how posts are sent to users timelines.
What does Instagram use?
- Data stored in Postgres cluster. IDs generated with Postgres clusters as well (Sharding IDs at Instagram).
- Django/Python stack for site.
- From here: “All of our Redis deployments run in master-slave, with the slave set to save to disk about every minute.”