https://www.robinlinacre.com/fellegi_sunter_accuracy/ Why Probabilistic Linkage is More Accurate than Fuzzy Matching For Data DeduplicationRobin Linacre2023-10-24 “How effectively do different approaches to record linkage exploit the information in the data to make
What I Read: SQL order
https://lukianovihor.medium.com/sql-order-of-query-execution-8c7cd926400 SQL — order of query executionIhor LukianovSep 24, 2023 “To maximize your query’s speed on any SQL engine, it’s essential to have an understanding of the SQL execution order.”
What I Read: Database Disassembly
https://materializedview.io/p/databases-are-falling-apart Databases Are Falling Apart: Database Disassembly and Its ImplicationsChris RiccominiJan 29, 2024 “Why are engineers taking databases apart and putting them back together, again?”
What I Read: Unify Batch and ML Systems
https://www.kdnuggets.com/2023/09/hopsworks-unify-batch-ml-systems-feature-training-inference-pipelines Unify Batch and ML Systems with Feature/Training/Inference PipelinesBy Jim Dowling, Co-Founder & CEO, HopsworksSeptember 27, 2023 “This article introduces a unified architectural pattern for building both Batch and Real-Time
What I Read: Distributed Training, Finetuning
https://sumanthrh.com/post/distributed-and-efficient-finetuning/ Everything about Distributed Training and Efficient FinetuningSumanth R HegdeLast updated on Oct 13, 2023 “practical guidelines and gotchas with multi-GPU and multi-node training”