Hands-on Tutorials

Use Pipelines to streamline your data science project right now!

Image for post
Image for post
Photo by Myriam Jessier on Unsplash

Most of the data science projects (as keen as I am to say all of them) require a certain level of data cleaning and preprocessing to make the most of the machine learning models. Some common preprocessing or transformations are:

a. Imputing missing values

b. Removing outliers

c. Normalising or standardising numerical features

d. Encoding categorical features

Sci-kit learn has a bunch of functions that support this kind of transformation, such as StandardScaler, SimpleImputer…etc, under the preprocessing package.

A typical and simplified data science workflow would like

  1. Clean/preprocess/transform the data
  2. Train a machine learning model
  3. Evaluate and optimise the…

See how we combat the pandemic from a statistical perspective

Image for post
Image for post
Photo by Edwin Hooper on Unsplash

To test or not to test, this is a statistical question.

In this global battle with the pandemic, Taiwan has done a marvelous job keeping its citizens safe and healthy, with only 850 confirmed cases and 7 deaths in total to date (17/01/2021). As a small island only a strait from China, where the virus originated, the number is nothing but incredible. Apart from recognising the pandemic much sooner than the rest of the world, Taiwan has never imposed compulsory COVID-19 testing on international arrivals. As a matter of fact, there aren’t many countries that have this policy imposed.

Why don’t we test all arrivals? Doesn’t the universal testing enable us to recognise the patients before they get into the country and, therefore, to prevent local transmission and community spread? Why do I still have to do self-quarantine if I’m tested negative? You may ask. …

An easy trick of python’s built-in database, SQLite, to make your data manipulation more flexible and effortless.

Image for post
Image for post
Photo by William Iven on Unsplash

Pandas is a powerful Python package to wrangle your data. However, have you ever encountered some tasks that just make you think ‘if only I could use SQL query here!’? I personally found it particularly annoying when it comes to joining multiple tables and extracting only those columns you want in pandas. For example, you’d like to join 5 tables. You absolutely can do this with only one query in SQL. But in pandas, you have to do 4 times merge, a+b, (a+b)+c, ((a+b)+c)+d,….What’s worse, every time you merge, pandas will keep all columns, despite you probably only need one or two columns from the other table. …

Image for post
Image for post
Photo by Markus Winkler on Unsplash

This article provides a step-by-step tutorial of connecting to Azure SQL Server using Python on Linux OS.

After creating an Azure SQL Database/Server, you can find the server name on the overview page.

Hands-on Tutorials

How to use Resample in Pandas to enhance your time series data analysis

Image for post
Image for post
Photo by Jiyeon Park on Unsplash

When it comes to time series analysis, resampling is a critical technique that allows you to flexibly define the resolution of the data you want. You can either increase the frequency like converting 5-minute data into 1-minute data (upsample, increase in data points), or you can do the other way around (downsample, decrease in data points).

Quoting the words from documentation, resample is a “Convenient method for frequency conversion and resampling of time series.

In practice, there are 2 main reasons why using resample.

  1. To join tables with different resolutions. …

Image for post
Image for post

Jupyter notebook, previously known as IPython notebook, is one of the most popular IDEs for data science projects. You can put all the codes, visualisations, notes, images, or comments all together to enhance readability and communication. Following are some tricks I found pretty useful and wish I’d known earlier after working on a number of data science/analysis projects.

1. Notebook width adjustment

When you open a notebook, it doesn’t come full width as default. It will only utilise around 50% of the screen. …

Image for post
Image for post

This is a book about the story of the birth of a billion dollar company — Netflix.

As someone who has always been fascinated to start-up/entrepreneur’s stories, this book naturally caught my eyes at the first sight. Netflix is a great company not only because it has nearly 200 million subscribers and worth almost 200 billion (stats at 2020 Q1), but how it had transformed the entertainment industry. It redefines the way we consume entertainment, introduces the concept of ‘binge watching’, and is a popular euphemism for getting laid (in the author’s own word). …

Image for post
Image for post

經過六個月的奮戰很幸運的拿得了兩份工作機會。這兩份 offer 相差了大約一個禮拜,在我答應完第一個 offer 的當天下午馬上收到第二個 offer。身為一個重度選擇障礙者,我大概花了一個禮拜猶豫要選擇哪一間。


我的情況是一間是做飯店 CRM 系統的新創團隊 A,一間是做基礎建設的顧問公司 B。前者只有大約五人,後者約有一百人。關於為什麼做了這個決定之後有機會再寫另一篇文討論。結論就是已經接受了 A 公司的 offer 但是後來決定要去 B 公司。

在決定好自己確定要什麼之後,我會建議做一下心態調整。對公司來說最大的慘劇其實是雇用了一個不合適的人,做不完試用期就得離開,比這個好的情況都可以接受。所以與其加入公司後一下就離開,不如一開始就不要去,反而是幫了公司一個大忙。有了這個心態之後跟 A 公司談起來比較不會扭扭捏捏或是很心虛覺得抱歉。

再來就是要好好的跟 A 公司解釋清楚為什麼反悔,B 公司的哪些條件比較適合自己等等。記住,我們也不希望跟 A 公司撕破臉,千萬不要用 B 公司哪裡哪裡比你好這種「比較兩間公司」的說法,而是用「我覺得另一間公司比較適合我」的方式,意思雖然差不多,但是聽起來那個舒服度就差很多。若是薪資差很多的話我反而覺得是可以直接說了,畢竟這是很死的東西,大家也都很清楚薪資通常都是第一優先考慮。被拒絕的公司如果真的很想要你的話也會提高價碼要搶你。我自己是兩間公司開的薪水一樣所以沒有這個籌碼可以談。


後來我也確實讓 A 公司明白我的立場讓我去了 B 公司,以一個和平的方式分手,分手後也可以繼續當朋友。


Image for post
Image for post


準備面試其實沒有什麼技巧,就是練習再練習。就像考前猜題一樣,先寫下你覺得面試官會問的題目,大約30~40題應該可以 cover 到八成的題目了。而練習不是看著題目想想要講什麼就好,而是真的去構思、架構、寫下你真的要講的內容,並不斷地説説說說說到可以神態自若地講出答案。我當時幾乎每天下班回家就會跟室友模擬個幾題,然後檢討這樣回答好不好,哪些可以補充或是修正,一直把回答修到盡善盡美。有一段時間我們一見到對方就是先說「Hey! Tell me about yourself.」XD。

一般來說履歷通過後公司並不會直接邀請你去面試,而是會先由人資進行一場電話面試,電話面試的目的其實 …

Image for post
Image for post




  1. Educational Background
  2. Working Experience
  3. Other

Resume Summary & Skill Set

我一開始不知道這部分的重要性,加進去之後才發現威力無窮!這有點像是你整個人的 Summary,又不能太長,大約100~150字左右,有時候人資甚至只會看這一部分就決定要不要找你來面試,所以務必字字斟酌,用簡短的幾句話把你的人格特質&專業技能描繪清楚。這也是放 Power Words 最好的地方,網路上也有針對各種職位的 Summary/Power Words 的範例可以參考。

Skill Set的部分比較偏技術人員才需要,這裡詳列出你會的技術、程式語言以及程度,有證照的話更好,可一併放上增加可信度。

Educational Background

這部分沒什麼太多需要著墨的,大家會糾結的應該就是到底要不要放上成績,我的經驗是非常少或是非常老牌的公司才會要求要知道你的成績。如果你的GPA > 4 怕沒寫別人不知道你有多優秀那就放吧,小弟大學連 3 都沒有是提都不敢提的。印象中比較會問的只有四大事務所跟老牌銀行,其他一樣nobody cares。反而要特別放在校系下面的是你在學期間所完成的專案/作業/論文,如果論文有被刊登也可以附上連結或是說明哪個刊物第幾期。課外活動要慎選,如果不是社長/會長或是與這工作八竿子打不著關係的話我建議就不要放了,較大型的商業競賽如ATCC, YEF(現在還有嗎?)可視情況放上。想像一下你今天要徵一個資料工程師,你應該不太會在意他以前是吉他社社長還是宿營總召吧?除非你有非常特別的經歷可以放在 Other。

Working Experience

這部分毫無懸念的是整份履歷最重要的部分,跟Resume Summary一樣要字字珠璣,並且要與應徵工作相關。像我以前都是做Business Development,我就會寫很少,甚至不寫BD的工作內容,而是盡量找我在BD之外自己找的分析工作來寫。除了Power Words的選用之外,另一個很重要的點是數字。公司最喜歡數字了,因為數字一目瞭然的量化了你這個人的產值。你說「我會寫Python」人資理都不理你,「我寫Python有五年經驗」人資可能微微抬頭聽聽你要說什麼,「我用Python自動化XX流程,提升員工效率20%」「我用GA分析網站流量跟使用者流程並優化,提升曝光度與轉換率15%」人資馬上就興致昂昂了。不過加上數字有兩個重要的事:1)數字要合理 2)當面試被問到的時候要能詳細的解釋這數字怎麼來的。



求職信 Cover Letter

Cover Letter 指的是當你寄履歷給人資時,email的主體內容,或是在網站上「有什麼想對公司說的話?」之類的部分。雖然在台灣可能不常有公司會要求,但在歐洲國家他們是非常重視這個東西的。求職信的內容需要更精準一點,有點像是你直接跑到要應徵的公司,經過警衛的阻攔,推開重重大門,好不容易走到老闆面前說我要應徵,然後老闆說「好啦我現在剛好要去尿尿,你在從這到廁所的路上跟我說說為什麼我要雇用你。」這時候如果沒有先寫好 Cover Letter 的話不就慘了,腦袋一片空白。


  1. 你能為公司帶來什麼

第一段就開門見山說我在某某管道看到貴公司在徵 XX 職位,研究一番後很有興趣所以前來應徵,我相信我的背景非常適合,兩三句話就解決。




最後,檢查再檢查。錯別字跟文法錯誤這種小小的卻又很致命的傷千萬別犯了。另外在投之前也務必要詳細研讀公司開出來的工作職缺介紹,Requirement是你要去專攻的地方,最好能在履歷及求職信中把Requirement的每一點都包含進去,讓公司光看履歷就覺得這94我要找的人啊!適時的根據 Job Description 去修改履歷及求職信絕對能大幅增加你的成功率。


若有意在國外找工作,LinkedIn 絕對是 TOP1 重要的利器,我拿到的兩個 offer 都是獵頭在 LinkedIn上找到我的。LinkedIn 就像是你的個人品牌頁,或是你在網路上流傳的履歷表,整個頁面的內容跟形象塑造務必要到位,一個月一千塊的 Premium 也給他買下去(有一個月的免費試用)。網路上有很多關於 LinkedIn 的教戰守則都比我厲害,這裡就不細談。



James Ho

Data Enthusiast in London

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store