提供高质量的essay代写,Paper代写,留学作业代写-天才代写

首頁 > > 詳細

CMPT 459 Spring 2021 Data Mining

 CMPT 459 Spring 2021 Data Mining [Total Marks:100]

提供高质量的essay代写,Paper代写,留学作业代写-天才代写The aim of this assignment is to implement DBSCAN, which is a density-based clustering algorithm, using Python programming language, and to test it on a household power consumption dataset. DBSCAN pseudocode is provided on page

268
Household Dataset This dataset contains 525600 measurements of electric power consumption in one house located in Sceaux (7km from Paris, France) in 2007. Attributes are as follows: ● date: Date in format dd/mm/yyyy ● time: time in format hh:mm:ss ● global_active_power: household global minute-averaged active power (in kilowatt) ● global_reactive_power: household global minute-averaged reactive power (in kilowatt) ● voltage: minute-averaged voltage (in volt) ● global_intensity: household global minute-averaged current intensity (in ampere) ● sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered). ● sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
● sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner. Tasks a. Preprocessing [15 marks] The Household dataset contains missing values that you need to handle. Moreover, you should normalize the data. You can also apply any other preprocessing that you think is helpful. Please note that you should explain all the preprocessing you have done in your report file to receive full marks. b. Implementation of DBSCAN [45 marks] Implement the DBSCAN algorithm. If an object is density-reachable from two clusters, then it should be assigned to both clusters [10 marks]. The implementation takes the dataset as input (along with other necessary attributes) and returns a file with an additional attribute “cluster label” [10 marks]. Cluster labels should start from 0, and noise objects should be labelled as -1 [5 marks]. Your implementation has to include a method, named “fit”, which gets input data and returns a list of cluster labels. You will not get the marks for this part if your implementation does not have a method with exact same name and same functionality [20 marks]. c. Test your DBSCAN implementation on the household dataset. [40 marks] Use the heuristic approach taught in the class or some other approach to determine reasonable values for the parameters epsilon and MinPts. Explain how you chose the parameters and include your k-distance-diagram in your report file [25 marks]. Provide the statistics of your resulting clustering: how many clusters, how many objects per cluster. Discuss how good the resulting clustering is [15 marks]. Hints: ● Running DBSCAN algorithm on this dataset takes approximately 18 minutes on a Intel core i5 processor.
● You may want to use a smaller subset of this data for debugging purposes, and then run your implementation on the whole dataset only after you made sure it works correctly. Note that this is only for debugging and your report has to be based on the whole data. ● You may want to remove Date and Time columns. Or you may want to become more creative! It’s your choice. [IMPORTANT] You should submit two files: ● A report file: [studentID].pdf ● A Python file: [studentID].py Libraries: You can use libraries including math, numpy, scipy, random, etc. You MUST provide YOUR OWN code for the DBSCAN algorithm and for all the tasks specified in this assignment. These MUST be implemented from scratch i.e. not using scikit-learn or other libraries. You will be marked on the correctness of your implementation.
聯系我們
  • QQ:1067665373
  • 郵箱:1067665373@qq.com
  • 工作時間:8:00-23:00
  • 微信:Essay_Cheery
熱點文章
程序代寫更多圖片

聯系我們 - QQ: 1067665373 微信:Essay_Cheery
? 2021 uk-essays.net
程序代寫網!

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack 北美最专业的线上写作专家:网课代修,网课代做,CS代写,程序代写 代码代写,CS编程代写,java代写北美最好的一站式学术代写服务机构 美国essay代写,作业代写,✔美国网课代上-最靠谱最低价 美国代写服务,作业代写,CS编程代写,java代写,python代写,c++/c代写 代写essay,作业代写,金融代写,business代写-留学生代写平台 北美代写,美国作业代写,网课代修,Assignment代写-100%原创 北美作业代写,【essay代写】,作业【assignment代写】,网课代上代考