《1. 使用ClickHouse進行機器學習.pdf》由會員分享,可在線閱讀,更多相關《1. 使用ClickHouse進行機器學習.pdf(64頁珍藏版)》請在三個皮匠報告上搜索。
1、MachineLearningwithClickHouse Nikolai Kochetov, ClickHouse developer Experimental dataset NYC Taxi and Uber Trips Where to download: https:/www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page How to import data into ClickHouse: https:/clickhouse.yandex/docs/en/getting_started/example_datasets/nyc_
2、taxi/ What you can also read: https:/ analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/ 3/62 External Tools Tools you got used to Small sample of data is enough to start All you need is to get it from ClickHouse Couple of lines for Python + Pandas import requests import io import panda
3、s as pd url = http:/127.0.0.1:8123?query= query = select * from trips limit 1000 format TSVWithNames resp = requests.get(url, data=query) string_io = io.StringIO(resp.text) table = pd.read_csv(string_io, sep=t) 5/62 Table (part) 6/62 How to sample data You already know it! LIMIT N WHERE condition SAMPLE x OFFSET y 7/62 How to sample data LIMIT N SELECT min(pickup_date), max(pickup_date) FROM ( SEL