data engineering – Page 3 – Andrew Fairless, Ph.D.

What I Read: How fast process CSV file

By Andrew Fairless on May 8, 2024March 7, 2024

https://datapythonista.me/blog/how-fast-can-we-process-a-csv-file How fast can we process a CSV fileMarc GarciaThu 22 February 2024 “…we’ll see in this blog post how to process a CSV file as fast as possible.”

What I Read: Scaling ChatGPT, Engineering Challenges

By Andrew Fairless on April 18, 2024March 1, 2024

https://newsletter.pragmaticengineer.com/p/scaling-chatgpt Scaling ChatGPT: Five Real-World Engineering ChallengesGergely OroszFeb 20, 2024 “Just one year after its launch, ChatGPT had more than 100M weekly users. In order to meet this explosive demand,Continue readingWhat I Read: Scaling ChatGPT, Engineering Challenges

What I Read: Probabilistic Linkage, Data Deduplication

By Andrew Fairless on April 15, 2024March 1, 2024

https://www.robinlinacre.com/fellegi_sunter_accuracy/ Why Probabilistic Linkage is More Accurate than Fuzzy Matching For Data DeduplicationRobin Linacre2023-10-24 “How effectively do different approaches to record linkage exploit the information in the data to makeContinue readingWhat I Read: Probabilistic Linkage, Data Deduplication

What I Read: Deploy Model

By Andrew Fairless on April 9, 2024March 1, 2024

https://outerbounds.com/blog/the-many-ways-to-deploy-a-model/ The Many Ways to Deploy a ModelHamel Husain, The Outerbounds TeamFebruary 6, 2024 “Over the past years, we have been helping companies deploy a wildly diverse set of MLContinue readingWhat I Read: Deploy Model

What I Read: SQL order

By Andrew Fairless on March 28, 2024March 23, 2024

https://lukianovihor.medium.com/sql-order-of-query-execution-8c7cd926400 SQL — order of query executionIhor LukianovSep 24, 2023 “To maximize your query’s speed on any SQL engine, it’s essential to have an understanding of the SQL execution order.”

What I Read: Database Disassembly

By Andrew Fairless on March 27, 2024February 5, 2024

https://materializedview.io/p/databases-are-falling-apart Databases Are Falling Apart: Database Disassembly and Its ImplicationsChris RiccominiJan 29, 2024 “Why are engineers taking databases apart and putting them back together, again?”

What I Read: misleading GPU, CPU benchmarks

By Andrew Fairless on March 21, 2024February 5, 2024

https://pythonspeed.com/articles/gpu-vs-cpu/ Beware of misleading GPU vs CPU benchmarksItamar Turner-TrauringLast updated 17 Jan 2024, originally created 17 Jan 2024 “Unfortunately, while those speed-ups are impressive, they are also misleading. GPU-based librariesContinue readingWhat I Read: misleading GPU, CPU benchmarks

What I Read: Navigating Data Tensions

By Andrew Fairless on February 28, 2024January 25, 2024

https://dataanalysis.substack.com/p/special-edition-analytics-as-applied Special Edition: Navigating Data Tensions and the Future of Analytics | Lauren BalikOlga Berezovsky and Lauren BalikJan 8, 2024 “…it is now more expensive to acquire new customers, andContinue readingWhat I Read: Navigating Data Tensions

What I Read: Unify Batch and ML Systems

By Andrew Fairless on January 18, 2024November 7, 2023

https://www.kdnuggets.com/2023/09/hopsworks-unify-batch-ml-systems-feature-training-inference-pipelines Unify Batch and ML Systems with Feature/Training/Inference PipelinesBy Jim Dowling, Co-Founder & CEO, HopsworksSeptember 27, 2023 “This article introduces a unified architectural pattern for building both Batch and Real-TimeContinue readingWhat I Read: Unify Batch and ML Systems

Tag: data engineering