카테고리 없음

머신러닝with파이썬10(2)_머신러닝 프로젝트(성능향상tip_답)

디지털랫드 2024. 3. 28. 00:54

7. 성능향상 Tip


baseline score보다 점수가 낮게 나오도록 프로젝트를 도전해 보셨나요?
성능 향상을 위한 TIP코드를 준비했으니 아래 준비한 파일을 다운 받아 LMS에 직접 입력하거나 코랩, 주피터 노트북 등에서 활용해 보세요.

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ySTMVUAR458d"
   },
   "source": [
    "# 머신러닝 프로젝트"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KWE9ZloU48sf"
   },
   "source": [
    "## Airbnb (New York City)\n",
    "- 미국 NYC Airbnb 목록(2019)\n",
    "- 데이터 출처:https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data (License CC0: Public Domain)\n",
    "- 프로젝트 목적: 가격 예측(price)\n",
    "- 제공 데이터(3개): train.csv, test.csv, y_test(최종 채점용)\n",
    "- 평가 방식: MSE\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "swTGNLoBFaS6"
   },
   "source": [
    "# 성능향상 Tip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "executionInfo": {
     "elapsed": 274,
     "status": "ok",
     "timestamp": 1654407659997,
     "user": {
      "displayName": "Tae Heon Kim",
      "userId": "07653788752262629837"
     },
     "user_tz": -540
    },
    "id": "UHaAsvYa9jAX"
   },
   "outputs": [],
   "source": [
    "# 라이브러리 \n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "executionInfo": {
     "elapsed": 418,
     "status": "ok",
     "timestamp": 1654407660942,
     "user": {
      "displayName": "Tae Heon Kim",
      "userId": "07653788752262629837"
     },
     "user_tz": -540
    },
    "id": "b8ar8Ohk_h4Z"
   },
   "outputs": [],
   "source": [
    "# 데이터 불러오기\n",
    "train = pd.read_csv('train.csv')\n",
    "test = pd.read_csv('test.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1BPuoeckATA3"
   },
   "source": [
    "## EDA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 21,
     "status": "ok",
     "timestamp": 1654407660942,
     "user": {
      "displayName": "Tae Heon Kim",
      "userId": "07653788752262629837"
     },
     "user_tz": -540
    },
    "id": "3URb2ddyAHMc",
    "outputId": "10dc3207-23ca-4479-fced-6044e844d28e"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((39116, 16), (9779, 15))"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 데이터 크기\n",
    "train.shape, test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 531
    },
    "executionInfo": {
     "elapsed": 20,
     "status": "ok",
     "timestamp": 1654407660943,
     "user": {
      "displayName": "Tae Heon Kim",
      "userId": "07653788752262629837"
     },
     "user_tz": -540
    },
    "id": "BwkRFT7oART_",
    "outputId": "4781f6ff-d9b7-476c-aad4-a771ccaccae9"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365
0 14963583 Room in South Harlem near Central Park 94219511 Gilles Manhattan Harlem 40.80167 -73.95781 Private room 70 3 3 2019-01-01 0.09 2 0
1 9458704 Large 1BR Apartment, near Times Sq (2nd Floor) 49015331 Iradj Manhattan Hell's Kitchen 40.76037 -73.99016 Entire home/apt 240 2 64 2019-06-30 1.68 2 262
\n", "
" ], "text/plain": [ " id name host_id \\\n", "0 14963583 Room in South Harlem near Central Park 94219511 \n", "1 9458704 Large 1BR Apartment, near Times Sq (2nd Floor) 49015331 \n", "\n", " host_name neighbourhood_group neighbourhood latitude longitude \\\n", "0 Gilles Manhattan Harlem 40.80167 -73.95781 \n", "1 Iradj Manhattan Hell's Kitchen 40.76037 -73.99016 \n", "\n", " room_type price minimum_nights number_of_reviews last_review \\\n", "0 Private room 70 3 3 2019-01-01 \n", "1 Entire home/apt 240 2 64 2019-06-30 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.09 2 0 \n", "1 1.68 2 262 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365
0 30913224 Cozy and Sunny Room Williamsburg, Luxury Building 33771081 Rémy Brooklyn Williamsburg 40.70959 -73.94652 Private room 3 2 2019-05-08 0.31 1 0
1 971247 Sunny Artist Live/Work Apartment 5308961 Larry Manhattan Upper West Side 40.79368 -73.96487 Entire home/apt 3 159 2019-07-03 2.09 1 244
\n", "
" ], "text/plain": [ " id name host_id \\\n", "0 30913224 Cozy and Sunny Room Williamsburg, Luxury Building 33771081 \n", "1 971247 Sunny Artist Live/Work Apartment 5308961 \n", "\n", " host_name neighbourhood_group neighbourhood latitude longitude \\\n", "0 Rémy Brooklyn Williamsburg 40.70959 -73.94652 \n", "1 Larry Manhattan Upper West Side 40.79368 -73.96487 \n", "\n", " room_type minimum_nights number_of_reviews last_review \\\n", "0 Private room 3 2 2019-05-08 \n", "1 Entire home/apt 3 159 2019-07-03 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.31 1 0 \n", "1 2.09 1 244 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 데이터 샘플\n", "display(train.head(2))\n", "display(test.head(2))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 283 }, "executionInfo": { "elapsed": 432, "status": "ok", "timestamp": 1654407661366, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "g4bdXDwhApJX", "outputId": "d16ead8b-d6ca-487f-bb9c-592bc144242a" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAD4CAYAAADsKpHdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWVUlEQVR4nO3df7DddZ3f8eerycJSViWIvRMTtonT6AxKi3IHcNx1bpcVAu6IdhwbhlmiUqNVZtaWmW2oncFqmcGtrC3U4sY1FXeyICtqMohlI/XWdqYgYaUkINlcMJZkAlmBlV7docZ994/zuXi8uTe5nHN/5N7zfMx853y/7++P832fb8gr3x/nkKpCkqS/s9A7IEk6MRgIkiTAQJAkNQaCJAkwECRJzfKF3oFenXHGGbVmzZqe1v3JT37CqaeeOrs7dIKz58Fgz0tfv/0++OCDP6qqV001b9EGwpo1a9i1a1dP646OjjIyMjK7O3SCs+fBYM9LX7/9JvnhdPO8ZCRJAgwESVJjIEiSAANBktQYCJIkYAaBkGRrksNJ9nTVvpzkoTbsT/JQq69J8jdd8z7Xtc65SXYnGUtyU5K0+ulJdibZ115XzEGfkqTjmMkZwheB9d2FqvqnVXVOVZ0D3Al8tWv24xPzqupDXfVbgA8A69owsc3NwL1VtQ64t01LkubZcQOhqr4DPDvVvPav/PcAtx1rG0lWAi+vqvuq83vbXwLe2WZfBtzaxm/tqkuS5lG/9xB+E3i6qvZ11dYm+V6S/57kN1ttFXCga5kDrQYwVFWH2vhTwFCf+yRJ6kG/31S+nF8+OzgE/HpVPZPkXODrSV4/041VVSWZ9v/Yk2QTsAlgaGiI0dHRnnb68LM/5uZt2wE4e9UretrGYjM+Pt7z57VY2fNgGLSe57LfngMhyXLgnwDnTtSq6gXghTb+YJLHgdcCB4HVXauvbjWAp5OsrKpD7dLS4enes6q2AFsAhoeHq9evb9+8bTs37u60vv+K3rax2Aza1/vBngfFoPU8l/32c8not4HHqurFS0FJXpVkWRt/DZ2bx0+0S0LPJ7mg3Xe4EtjeVtsBbGzjG7vqkqR5NJPHTm8D/hfwuiQHklzVZm3g6JvJbwUebo+hfgX4UFVN3JD+MPDHwBjwOPDNVr8BeFuSfXRC5obe25Ek9eq4l4yq6vJp6u+donYnncdQp1p+F/CGKerPABcebz8kSXPLbypLkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAmYQSAk2ZrkcJI9XbWPJzmY5KE2XNo179okY0n2Jrm4q76+1caSbO6qr01yf6t/OclJs9mgJGlmZnKG8EVg/RT1z1TVOW24GyDJWcAG4PVtnf+cZFmSZcBngUuAs4DL27IAn2rb+gfAc8BV/TQkSerNcQOhqr4DPDvD7V0G3F5VL1TVD4Ax4Lw2jFXVE1X1/4DbgcuSBPgt4Ctt/VuBd760FiRJs2F5H+teneRKYBdwTVU9B6wC7uta5kCrATw5qX4+8Ergr6vqyBTLHyXJJmATwNDQEKOjoz3t+NApcM3ZnbfsdRuLzfj4+MD0OsGeB8Og9TyX/fYaCLcAnwSqvd4IvH+2dmo6VbUF2AIwPDxcIyMjPW3n5m3buXF3p/X9V/S2jcVmdHSUXj+vxcqeB8Og9TyX/fYUCFX19MR4ks8Dd7XJg8CZXYuubjWmqT8DnJZkeTtL6F5ekjSPenrsNMnKrsl3ARNPIO0ANiQ5OclaYB3wXeABYF17ougkOjeed1RVAd8G3t3W3whs72WfJEn9Oe4ZQpLbgBHgjCQHgOuAkSTn0LlktB/4IEBVPZLkDuBR4Ajwkar6edvO1cA9wDJga1U90t7iXwG3J/l3wPeAL8xWc5KkmTtuIFTV5VOUp/1Lu6quB66fon43cPcU9SfoPIUkSVpAflNZkgQYCJKkxkCQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBPT389dLwprN33hxfP8Nb1/APZGkheUZgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQJmEAhJtiY5nGRPV+3fJ3ksycNJvpbktFZfk+RvkjzUhs91rXNukt1JxpLclCStfnqSnUn2tdcVc9CnJOk4ZnKG8EVg/aTaTuANVfUPgb8Eru2a93hVndOGD3XVbwE+AKxrw8Q2NwP3VtU64N42LUmaZ8cNhKr6DvDspNqfV9WRNnkfsPpY20iyEnh5Vd1XVQV8CXhnm30ZcGsbv7WrLkmaR7Px89fvB77cNb02yfeA54F/U1X/A1gFHOha5kCrAQxV1aE2/hQwNN0bJdkEbAIYGhpidHS0px0eOgWuOfvIUfVet7cYjI+PL+n+pmLPg2HQep7LfvsKhCQfA44A21rpEPDrVfVMknOBryd5/Uy3V1WVpI4xfwuwBWB4eLhGRkZ62u+bt23nxt1Ht77/it62txiMjo7S6+e1WNnzYBi0nuey354DIcl7gd8BLmyXgaiqF4AX2viDSR4HXgsc5JcvK61uNYCnk6ysqkPt0tLhXvdJktS7nh47TbIe+H3gHVX10676q5Isa+OvoXPz+Il2Sej5JBe0p4uuBLa31XYAG9v4xq66JGkeHfcMIcltwAhwRpIDwHV0nio6GdjZnh69rz1R9FbgE0l+Bvwt8KGqmrgh/WE6TyydAnyzDQA3AHckuQr4IfCeWelMkvSSHDcQquryKcpfmGbZO4E7p5m3C3jDFPVngAuPtx+SpLnlN5UlSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJagwESRJgIEiSmhkFQpKtSQ4n2dNVOz3JziT72uuKVk+Sm5KMJXk4yZu61tnYlt+XZGNX/dwku9s6NyXJbDYpSTq+mZ4hfBFYP6m2Gbi3qtYB97ZpgEuAdW3YBNwCnQABrgPOB84DrpsIkbbMB7rWm/xekqQ5NqNAqKrvAM9OKl8G3NrGbwXe2VX/UnXcB5yWZCVwMbCzqp6tqueAncD6Nu/lVXVfVRXwpa5tSZLmyfI+1h2qqkNt/ClgqI2vAp7sWu5Aqx2rfmCK+lGSbKJz1sHQ0BCjo6O97fgpcM3ZR46q97q9xWB8fHxJ9zcVex4Mg9bzXPbbTyC8qKoqSc3Gto7zPluALQDDw8M1MjLS03Zu3radG3cf3fr+K3rb3mIwOjpKr5/XYmXPg2HQep7Lfvt5yujpdrmH9nq41Q8CZ3Ytt7rVjlVfPUVdkjSP+gmEHcDEk0Ibge1d9Svb00YXAD9ul5buAS5KsqLdTL4IuKfNez7JBe3poiu7tiVJmiczumSU5DZgBDgjyQE6TwvdANyR5Crgh8B72uJ3A5cCY8BPgfcBVNWzST4JPNCW+0RVTdyo/jCdJ5lOAb7ZBknSPJpRIFTV5dPMunCKZQv4yDTb2QpsnaK+C3jDTPZFkjQ3/KayJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJKAPgIhyeuSPNQ1PJ/ko0k+nuRgV/3SrnWuTTKWZG+Si7vq61ttLMnmfpuSJL10y3tdsar2AucAJFkGHAS+BrwP+ExVfbp7+SRnARuA1wOvBr6V5LVt9meBtwEHgAeS7KiqR3vdN0nSS9dzIExyIfB4Vf0wyXTLXAbcXlUvAD9IMgac1+aNVdUTAElub8saCJI0j2YrEDYAt3VNX53kSmAXcE1VPQesAu7rWuZAqwE8Oal+/lRvkmQTsAlgaGiI0dHRnnZ26BS45uwjR9V73d5iMD4+vqT7m4o9D4ZB63ku++07EJKcBLwDuLaVbgE+CVR7vRF4f7/vA1BVW4AtAMPDwzUyMtLTdm7etp0bdx/d+v4retveYjA6Okqvn9diZc+DYdB6nst+Z+MM4RLgL6rqaYCJV4AknwfuapMHgTO71lvdahyjLkmaJ7Px2OnldF0uSrKya967gD1tfAewIcnJSdYC64DvAg8A65KsbWcbG9qykqR51NcZQpJT6Twd9MGu8h8kOYfOJaP9E/Oq6pEkd9C5WXwE+EhV/bxt52rgHmAZsLWqHulnvyRJL11fgVBVPwFeOan2u8dY/nrg+inqdwN397MvkqT++E1lSRJgIEiSGgNBkgQYCJKkxkCQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpKbvQEiyP8nuJA8l2dVqpyfZmWRfe13R6klyU5KxJA8neVPXdja25fcl2djvfkmSXprZOkP4x1V1TlUNt+nNwL1VtQ64t00DXAKsa8Mm4BboBAhwHXA+cB5w3USISJLmx1xdMroMuLWN3wq8s6v+peq4DzgtyUrgYmBnVT1bVc8BO4H1c7RvkqQpzEYgFPDnSR5MsqnVhqrqUBt/Chhq46uAJ7vWPdBq09UlSfNk+Sxs4zeq6mCSvwfsTPJY98yqqiQ1C+9DC5xNAENDQ4yOjva0naFT4JqzjxxV73V7i8H4+PiS7m8q9jwYBq3nuey370CoqoPt9XCSr9G5B/B0kpVVdahdEjrcFj8InNm1+upWOwiMTKqPTvFeW4AtAMPDwzUyMjJ5kRm5edt2btx9dOv7r+hte4vB6OgovX5ei5U9D4ZB63ku++3rklGSU5O8bGIcuAjYA+wAJp4U2ghsb+M7gCvb00YXAD9ul5buAS5KsqLdTL6o1SRJ86TfM4Qh4GtJJrb1p1X1X5M8ANyR5Crgh8B72vJ3A5cCY8BPgfcBVNWzST4JPNCW+0RVPdvnvr1kazZ/48Xx/Te8fb7fXpIWVF+BUFVPAP9oivozwIVT1Av4yDTb2gps7Wd/JEm985vKkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkC+giEJGcm+XaSR5M8kuT3Wv3jSQ4meagNl3atc22SsSR7k1zcVV/famNJNvfXkiSpF8v7WPcIcE1V/UWSlwEPJtnZ5n2mqj7dvXCSs4ANwOuBVwPfSvLaNvuzwNuAA8ADSXZU1aN97Jsk6SXqORCq6hBwqI3/3yTfB1YdY5XLgNur6gXgB0nGgPPavLGqegIgye1tWQNBkuZRP2cIL0qyBngjcD/wFuDqJFcCu+icRTxHJyzu61rtAL8IkCcn1c+f5n02AZsAhoaGGB0d7Wl/h06Ba84+csxlet32iWp8fHzJ9XQ89jwYBq3nuey370BI8mvAncBHq+r5JLcAnwSqvd4IvL/f9wGoqi3AFoDh4eEaGRnpaTs3b9vOjbuP3fr+K3rb9olqdHSUXj+vxcqeB8Og9TyX/fYVCEl+hU4YbKuqrwJU1dNd8z8P3NUmDwJndq2+utU4Rl2SNE/6ecoowBeA71fVH3bVV3Yt9i5gTxvfAWxIcnKStcA64LvAA8C6JGuTnETnxvOOXvdLktSbfs4Q3gL8LrA7yUOt9q+By5OcQ+eS0X7ggwBV9UiSO+jcLD4CfKSqfg6Q5GrgHmAZsLWqHuljvyRJPejnKaP/CWSKWXcfY53rgeunqN99rPUkSXPPbypLkgADQZLUGAiSJMBAkCQ1BoIkCZiln65YitZs/saL4/tvePsC7okkzQ/PECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJavzpihnwZywkDQLPECRJgIEgSWoMBEkS4D2El8z7CZKWqhPmDCHJ+iR7k4wl2bzQ+yNJg+aEOENIsgz4LPA24ADwQJIdVfXowu7ZsXWfLXTzzEHSYnRCBAJwHjBWVU8AJLkduAw4oQNhOtMFxWQGh6QTyYkSCKuAJ7umDwDnT14oySZgU5scT7K3x/c7A/hRj+vOmnxqXt/uhOh5ntnzYBi0nvvt9+9PN+NECYQZqaotwJZ+t5NkV1UNz8IuLRr2PBjseemby35PlJvKB4Ezu6ZXt5okaZ6cKIHwALAuydokJwEbgB0LvE+SNFBOiEtGVXUkydXAPcAyYGtVPTKHb9n3ZadFyJ4Hgz0vfXPWb6pqrrYtSVpETpRLRpKkBWYgSJKAAQyEpfITGUnOTPLtJI8meSTJ77X66Ul2JtnXXle0epLc1Pp+OMmbura1sS2/L8nGheppppIsS/K9JHe16bVJ7m+9fbk9mECSk9v0WJu/pmsb17b63iQXL1ArM5LktCRfSfJYku8nefNSP85J/kX7c70nyW1JfnWpHeckW5McTrKnqzZrxzXJuUl2t3VuSpLj7lRVDcxA54b148BrgJOA/w2ctdD71WMvK4E3tfGXAX8JnAX8AbC51TcDn2rjlwLfBAJcANzf6qcDT7TXFW18xUL3d5ze/yXwp8BdbfoOYEMb/xzwz9v4h4HPtfENwJfb+Fnt2J8MrG1/JpYtdF/H6PdW4J+18ZOA05bycabzRdUfAKd0Hd/3LrXjDLwVeBOwp6s2a8cV+G5bNm3dS467Twv9oczzAXgzcE/X9LXAtQu9X7PU23Y6vwW1F1jZaiuBvW38j4DLu5bf2+ZfDvxRV/2XljvRBjrfUbkX+C3grvaH/UfA8snHmM5Ta29u48vbcpl83LuXO9EG4BXtL8dMqi/Z48wvfrng9Hbc7gIuXorHGVgzKRBm5bi2eY911X9puemGQbtkNNVPZKxaoH2ZNe0U+Y3A/cBQVR1qs54Chtr4dL0vts/kPwC/D/xtm34l8NdVdaRNd+//i721+T9uyy+mntcCfwX8l3aZ7I+TnMoSPs5VdRD4NPB/gEN0jtuDLO3jPGG2juuqNj65fkyDFghLTpJfA+4EPlpVz3fPq84/DZbMc8VJfgc4XFUPLvS+zKPldC4r3FJVbwR+QudSwouW4HFeQefHLdcCrwZOBdYv6E4tgIU4roMWCEvqJzKS/AqdMNhWVV9t5aeTrGzzVwKHW3263hfTZ/IW4B1J9gO307ls9B+B05JMfMmye/9f7K3NfwXwDIur5wPAgaq6v01/hU5ALOXj/NvAD6rqr6rqZ8BX6Rz7pXycJ8zWcT3YxifXj2nQAmHJ/ERGe2LgC8D3q+oPu2btACaeNNhI597CRP3K9rTCBcCP26npPcBFSVa0f5ld1GonnKq6tqpWV9UaOsfuv1XVFcC3gXe3xSb3PPFZvLstX62+oT2dshZYR+cG3Amnqp4Cnkzyula6kM7Pwi/Z40znUtEFSf5u+3M+0fOSPc5dZuW4tnnPJ7mgfYZXdm1regt9U2UBbuJcSueJnMeBjy30/vTRx2/QOZ18GHioDZfSuXZ6L7AP+BZwels+dP4nRI8Du4Hhrm29Hxhrw/sWurcZ9j/CL54yeg2d/9DHgD8DTm71X23TY23+a7rW/1j7LPYyg6cvFrjXc4Bd7Vh/nc7TJEv6OAP/FngM2AP8CZ0nhZbUcQZuo3OP5Gd0zgSvms3jCgy3z+9x4D8x6cGEqQZ/ukKSBAzeJSNJ0jQMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqfn/rab4s2/vk8AAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# target(hist)\n", "train['price'].hist(bins=100)" ] }, { "cell_type": "markdown", "metadata": { "id": "0DlUNAusB8Qr" }, "source": [ "## 데이터 전처리" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 11, "status": "ok", "timestamp": 1654407661368, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "4e-J3bK4QoXU", "outputId": "82d2df35-b8fc-4898-885e-48f3c9891467" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 70\n", "1 240\n", "2 150\n", "Name: price, dtype: int64\n", "0 4.262680\n", "1 5.484797\n", "2 5.017280\n", "Name: price, dtype: float64\n", "0 70.0\n", "1 240.0\n", "2 150.0\n", "Name: price, dtype: float64\n" ] } ], "source": [ "import numpy as np\n", "print(train['price'][:3])\n", "print(np.log1p(train['price'])[:3])\n", "print(np.expm1(np.log1p(train['price'])[:3]))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 283 }, "executionInfo": { "elapsed": 363, "status": "ok", "timestamp": 1654407661722, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "GIli-YOSRZlI", "outputId": "1b30cedb-4225-4d16-c808-149d35e6289f" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAASLElEQVR4nO3df4xlZX3H8fenoFWZhsXQTuiy6fLH1gYhUpgArU0zWyou0BSbNAZCEfyR9Q9otSWpq4nB+CPZP9S2Rku6la0YrROiGDewLd1unRj/QGGVuPzQsMHVMt1CLbi4aGqx3/5xz+J1ndmZuTN77+x93q9kcs99znPOfc6Tez/3uec+90yqCklSG35h1A2QJA2PoS9JDTH0Jakhhr4kNcTQl6SGnDrqBhzPmWeeWRs3bhx4++eee47TTjtt9Rp0krIfeuyHHvuhZ5z7Yd++fd+rql+eb92aDv2NGzfywAMPDLz97Ows09PTq9egk5T90GM/9NgPPePcD0m+s9A6T+9IUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JD1vQvcqVR2bjtnheWD26/aoQtkVaXI31JaoihL0kN8fSOtMo8NaS1zJG+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNWTT0k2xI8sUkjyR5OMnbuvL3JJlL8mD3d2XfNu9MciDJt5K8tq98S1d2IMm2E3NIkqSFLOXHWc8Dt1TV15L8ErAvyZ5u3V9V1Qf7Kyc5F7gGeCXwq8C/Jvn1bvXHgNcATwD3J9lVVY+sxoFIkha3aOhX1SHgULf8gySPAuuPs8nVwExV/Q/w7SQHgIu7dQeq6nGAJDNdXUNfkoYkVbX0yslG4EvAecBfADcCzwIP0Ps08EySjwL3VdWnum1uB/6p28WWqnpLV349cElV3XzMY2wFtgJMTk5eNDMzM/DBHTlyhImJiYG3Hxf2Q89y+mH/3OEXls9ff/qyHmcl2w6Dz4eece6HzZs376uqqfnWLfnaO0kmgM8Bb6+qZ5PcBrwPqO72Q8CbVtrYqtoB7ACYmpqq6enpgfc1OzvLSrYfF/ZDz3L64cb+6+dct7RtVmPbYfD50NNqPywp9JO8iF7gf7qq7gKoqif71v89cHd3dw7Y0Lf52V0ZxymXJA3BUmbvBLgdeLSqPtxXflZftT8CHuqWdwHXJPnFJOcAm4CvAvcDm5Kck+TF9L7s3bU6hyFJWoqljPRfDVwP7E/yYFf2LuDaJBfQO71zEHgrQFU9nOROel/QPg/cVFU/AUhyM3AvcAqws6oeXrUjkSQtaimzd74MZJ5Vu4+zzQeAD8xTvvt420njzOvsay3wF7mS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWrIov8YXdL8/EfnOhk50pekhjjSlxbhiF7jxJG+JDXEkb60hvipQieaI31JaoihL0kNMfQlqSGGviQ1xNCXpIYsGvpJNiT5YpJHkjyc5G1d+cuT7EnyWHd7RleeJB9JciDJN5Jc2LevG7r6jyW54cQdliRpPksZ6T8P3FJV5wKXAjclORfYBuytqk3A3u4+wBXApu5vK3Ab9N4kgFuBS4CLgVuPvlFIkoZj0dCvqkNV9bVu+QfAo8B64Grgjq7aHcDruuWrgU9Wz33AuiRnAa8F9lTV01X1DLAH2LKaByNJOr5U1dIrJxuBLwHnAd+tqnVdeYBnqmpdkruB7VX15W7dXuAdwDTwkqp6f1f+buBHVfXBYx5jK71PCExOTl40MzMz8MEdOXKEiYmJgbcfF/ZDz3L6Yf/c4XnLz19/+rx1VlK+0OMuVGelfD70jHM/bN68eV9VTc23bsm/yE0yAXwOeHtVPdvL+Z6qqiRLf/c4jqraAewAmJqaqunp6YH3NTs7y0q2Hxf2Q89y+uHGvl/G9jt43fS8dVZSvtDjLlRnpXw+9LTaD0uavZPkRfQC/9NVdVdX/GR32obu9qmufA7Y0Lf52V3ZQuWSpCFZyuydALcDj1bVh/tW7QKOzsC5AfhCX/kbulk8lwKHq+oQcC9weZIzui9wL+/KJElDspTTO68Grgf2J3mwK3sXsB24M8mbge8Ar+/W7QauBA4APwTeCFBVTyd5H3B/V++9VfX0ahyEJGlpFg397gvZLLD6snnqF3DTAvvaCexcTgMlSavHX+RKUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0JekhviP0TW2/Cfj0s9zpC9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5Ia4pRNNcepnGqZI31JaoihL0kNMfQlqSGGviQ1xNCXpIY4e0casf7ZRNKJ5khfkhpi6EtSQwx9SWqIoS9JDTH0Jakhzt6RlsGZNjrZOdKXpIYY+pLUEENfkhqyaOgn2ZnkqSQP9ZW9J8lckge7vyv71r0zyYEk30ry2r7yLV3ZgSTbVv9QJEmLWcpI/xPAlnnK/6qqLuj+dgMkORe4Bnhlt83fJjklySnAx4ArgHOBa7u6kqQhWnT2TlV9KcnGJe7vamCmqv4H+HaSA8DF3boDVfU4QJKZru4jy2+yJGlQqarFK/VC/+6qOq+7/x7gRuBZ4AHglqp6JslHgfuq6lNdvduBf+p2s6Wq3tKVXw9cUlU3z/NYW4GtAJOTkxfNzMwMfHBHjhxhYmJi4O3HRav9sH/u8AvL568//YV+OLZ8vvrLtdB+llu+0D5XU6vPh2ONcz9s3rx5X1VNzbdu0Hn6twHvA6q7/RDwpgH39TOqagewA2Bqaqqmp6cH3tfs7Cwr2X5ctNoPN/b/L9zrpl/oh2PL56u/bPuf67vz05fVQvtfyuP211lNrT4fjtVqPwwU+lX15NHlJH8P3N3dnQM29FU9uyvjOOWSpCEZaMpmkrP67v4RcHRmzy7gmiS/mOQcYBPwVeB+YFOSc5K8mN6XvbsGb7YkaRCLjvSTfAaYBs5M8gRwKzCd5AJ6p3cOAm8FqKqHk9xJ7wva54Gbquon3X5uBu4FTgF2VtXDq30w0rjqv/zDwe1XjbAlOtktZfbOtfMU336c+h8APjBP+W5g97JaJ0laVf4iV5IaYuhLUkO8tLKa5qWS1RpH+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcR5+tII+PsAjYojfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1Jaoi/yJVOIH95q7XG0JdOYv1vKge3XzXCluhk4ekdSWqIoS9JDTH0Jakhhr6asHHbPeyfO+wXq2qeoS9JDTH0Jakhhr4kNcTQl6SGLBr6SXYmeSrJQ31lL0+yJ8lj3e0ZXXmSfCTJgSTfSHJh3zY3dPUfS3LDiTkcSdLxLGWk/wlgyzFl24C9VbUJ2NvdB7gC2NT9bQVug96bBHArcAlwMXDr0TcKSdLwLBr6VfUl4Oljiq8G7uiW7wBe11f+yeq5D1iX5CzgtcCeqnq6qp4B9vDzbySSpBMsVbV4pWQjcHdVndfd/35VreuWAzxTVeuS3A1sr6ovd+v2Au8ApoGXVNX7u/J3Az+qqg/O81hb6X1KYHJy8qKZmZmBD+7IkSNMTEwMvP24GOd+2D93+IXl89efvuA6gMmXwpM/GkqzVkX/8Sx0nMc7/oWM8/NhOca5HzZv3ryvqqbmW7fiC65VVSVZ/J1j6fvbAewAmJqaqunp6YH3NTs7y0q2Hxfj3A839l9w7LrpBdcB3HL+83xo/0l0jcH9z/Xd+Wm7+4/zeMe/kHF+PixHq/0w6CvgySRnVdWh7vTNU135HLChr97ZXdkcvdF+f/nsgI8tzctf20qLG3TK5i7g6AycG4Av9JW/oZvFcylwuKoOAfcClyc5o/sC9/KuTJI0RIuO9JN8ht4o/cwkT9CbhbMduDPJm4HvAK/vqu8GrgQOAD8E3ghQVU8neR9wf1fvvVV17JfDkqQTbNHQr6prF1h12Tx1C7hpgf3sBHYuq3WSpFXlL3IlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhqyotBPcjDJ/iQPJnmgK3t5kj1JHutuz+jKk+QjSQ4k+UaSC1fjACRJS7caI/3NVXVBVU1197cBe6tqE7C3uw9wBbCp+9sK3LYKjy1JWoZTT8A+rwamu+U7gFngHV35J6uqgPuSrEtyVlUdOgFt0JjZuO2eF5YPbr9qhC2RTm7pZfCAGyffBp4BCvi7qtqR5PtVta5bH+CZqlqX5G5ge1V9uVu3F3hHVT1wzD630vskwOTk5EUzMzMDt+/IkSNMTEwMvP24GId+2D93+IXl89efPm/5YiZfCk/+aFWbNRILHX9/+fGMw/NhNYxzP2zevHlf39mXn7HSkf7vVNVckl8B9iT5Zv/Kqqoky3pXqaodwA6Aqampmp6eHrhxs7OzrGT7cTEO/XBj/0j/uul5yxdzy/nP86H9J+LD7XAtdPz95cf7ZDQOz4fV0Go/rOicflXNdbdPAZ8HLgaeTHIWQHf7VFd9DtjQt/nZXZkkaUgGDv0kpyX5paPLwOXAQ8Au4Iau2g3AF7rlXcAbulk8lwKHPZ8vScO1ks+6k8Dne6ftORX4x6r65yT3A3cmeTPwHeD1Xf3dwJXAAeCHwBtX8NiSpAEMHPpV9TjwqnnK/xu4bJ7yAm4a9PEkSSvnL3IlqSGGviQ1xNCXpIYY+lJj9s8dZuO2e35mLr/aYehLUkMMfUlqiKEvSQ05+S9EouZ4LloanKEvnWR809NKeHpHkhriSF9rliPa5bG/tBSGviTA/07WCk/vSFJDDH1Jaoind6Qxd+y5/lvOH1FDtCY40pekhjjSlxrmjJ/2ONKXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDXHKptYUpxCuPQtdk8dr9ZycHOlLUkMc6WvkHN1Lw+NIX5Ia4khf0s9Z6NPXUso9v7+2OdKXpIY40pe0qpbyHY2fBkZn6KGfZAvwN8ApwMeravuw26DR8BSANHpDDf0kpwAfA14DPAHcn2RXVT0yzHZo9Jyx07alDABWq45+1rBH+hcDB6rqcYAkM8DVwAkJ/f1zh7mxe1L4hFie5b6YfPFpUEsZAKykzkI/KPvEltNWbV+r9ZwfxusoVXVCdjzvgyV/DGypqrd0968HLqmqm/vqbAW2dndfAXxrBQ95JvC9FWw/LuyHHvuhx37oGed++LWq+uX5Vqy5L3KragewYzX2leSBqppajX2dzOyHHvuhx37oabUfhj1lcw7Y0Hf/7K5MkjQEww79+4FNSc5J8mLgGmDXkNsgSc0a6umdqno+yc3AvfSmbO6sqodP4EOuymmiMWA/9NgPPfZDT5P9MNQvciVJo+VlGCSpIYa+JDVkLEM/yZYk30pyIMm2UbdnFJJsSPLFJI8keTjJ20bdplFKckqSrye5e9RtGaUk65J8Nsk3kzya5LdG3aZRSPLn3evioSSfSfKSUbdpWMYu9Psu9XAFcC5wbZJzR9uqkXgeuKWqzgUuBW5qtB+Oehvw6KgbsQb8DfDPVfUbwKtosE+SrAf+DJiqqvPoTSq5ZrStGp6xC336LvVQVT8Gjl7qoSlVdaiqvtYt/4Dei3v9aFs1GknOBq4CPj7qtoxSktOB3wVuB6iqH1fV90faqNE5FXhpklOBlwH/MeL2DM04hv564N/77j9Bo2F3VJKNwG8CXxlxU0blr4G/BP5vxO0YtXOA/wL+oTvV9fEkC1+AZkxV1RzwQeC7wCHgcFX9y2hbNTzjGPrqk2QC+Bzw9qp6dtTtGbYkfwA8VVX7Rt2WNeBU4ELgtqr6TeA5oLnvvJKcQe/T/znArwKnJfmT0bZqeMYx9L3UQyfJi+gF/qer6q5Rt2dEXg38YZKD9E71/V6ST422SSPzBPBEVR39xPdZem8Crfl94NtV9V9V9b/AXcBvj7hNQzOOoe+lHoAkoXfu9tGq+vCo2zMqVfXOqjq7qjbSey78W1U1M6rrV1X/Cfx7kld0RZdxgi5rvsZ9F7g0ycu618llNPSF9pq7yuZKjeBSD2vVq4Hrgf1JHuzK3lVVu0fXJK0Bfwp8uhsQPQ68ccTtGbqq+kqSzwJfozfL7es0dEkGL8MgSQ0Zx9M7kqQFGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIf8PiIyqMOpw0igAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "train['price'] = np.log1p(train['price'])\n", "train['price'].hist(bins=100)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661723, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "-aYOjQUGDVwV" }, "outputs": [], "source": [ "# 결측치 컬럼 삭제 (last_review)\n", "train = train.drop('last_review', axis=1)\n", "test = test.drop('last_review', axis=1)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661724, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "2eZXEkTNDpjJ" }, "outputs": [], "source": [ "# 결측치 채우기\n", "train['reviews_per_month'] = train['reviews_per_month'].fillna(0)\n", "test['reviews_per_month'] = test['reviews_per_month'].fillna(0)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 39, "status": "ok", "timestamp": 1654407661726, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "ZmhF7Wu2EHjD", "outputId": "a24dcb9b-0b18-41d0-98cc-39b80402ee7f" }, "outputs": [ { "data": { "text/plain": [ "id 0\n", "name 12\n", "host_id 0\n", "host_name 17\n", "neighbourhood_group 0\n", "neighbourhood 0\n", "latitude 0\n", "longitude 0\n", "room_type 0\n", "price 0\n", "minimum_nights 0\n", "number_of_reviews 0\n", "reviews_per_month 0\n", "calculated_host_listings_count 0\n", "availability_365 0\n", "dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 결측치 확인\n", "train.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "executionInfo": { "elapsed": 36, "status": "ok", "timestamp": 1654407661727, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "lUiKMyjxEmPW" }, "outputs": [], "source": [ "# 가격 값 복사\n", "target = train['price']\n", "train = train.drop('price', axis=1)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661728, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "xNP-SnrcB_HK" }, "outputs": [], "source": [ "# 수치형 피처 선택\n", "# 수치형 데이터와 범주형 데이터 분리 \n", "n_train = train.select_dtypes(exclude='object').copy()\n", "c_train = train.select_dtypes(include='object').copy()\n", "n_test = test.select_dtypes(exclude='object').copy()\n", "c_test = test.select_dtypes(include='object').copy()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 364 }, "executionInfo": { "elapsed": 38, "status": "ok", "timestamp": 1654407661729, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "7kweu2v2VMvD", "outputId": "0982eaf3-970c-4fec-f63c-8de57a59fb4c" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  id host_id latitude longitude minimum_nights number_of_reviews reviews_per_month calculated_host_listings_count availability_365
count 3.911600e+04 3.911600e+04 39116.000000 39116.000000 39116.000000 39116.000000 39116.000000 39116.000000 39116.000000
mean 1.898464e+07 6.774143e+07 40.728848 -73.952125 6.990720 23.272855 1.091963 7.090756 112.980826
std 1.099302e+07 7.881383e+07 0.054499 0.046354 20.310323 44.589170 1.600772 32.661136 131.674306
min 2.539000e+03 2.438000e+03 40.499790 -74.244420 1.000000 0.000000 0.000000 1.000000 0.000000
25% 9.412608e+06 7.834978e+06 40.690038 -73.983190 1.000000 1.000000 0.040000 1.000000 0.000000
50% 1.963650e+07 3.070949e+07 40.723000 -73.955740 2.000000 5.000000 0.370000 1.000000 45.000000
75% 2.913445e+07 1.074344e+08 40.762943 -73.936338 5.000000 23.000000 1.590000 2.000000 228.000000
max 3.648561e+07 2.743213e+08 40.912340 -73.712990 1250.000000 629.000000 58.500000 327.000000 365.000000
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights \\\n", "count 3.911600e+04 3.911600e+04 39116.000000 39116.000000 39116.000000 \n", "mean 1.898464e+07 6.774143e+07 40.728848 -73.952125 6.990720 \n", "std 1.099302e+07 7.881383e+07 0.054499 0.046354 20.310323 \n", "min 2.539000e+03 2.438000e+03 40.499790 -74.244420 1.000000 \n", "25% 9.412608e+06 7.834978e+06 40.690038 -73.983190 1.000000 \n", "50% 1.963650e+07 3.070949e+07 40.723000 -73.955740 2.000000 \n", "75% 2.913445e+07 1.074344e+08 40.762943 -73.936338 5.000000 \n", "max 3.648561e+07 2.743213e+08 40.912340 -73.712990 1250.000000 \n", "\n", " number_of_reviews reviews_per_month calculated_host_listings_count \\\n", "count 39116.000000 39116.000000 39116.000000 \n", "mean 23.272855 1.091963 7.090756 \n", "std 44.589170 1.600772 32.661136 \n", "min 0.000000 0.000000 1.000000 \n", "25% 1.000000 0.040000 1.000000 \n", "50% 5.000000 0.370000 1.000000 \n", "75% 23.000000 1.590000 2.000000 \n", "max 629.000000 58.500000 327.000000 \n", "\n", " availability_365 \n", "count 39116.000000 \n", "mean 112.980826 \n", "std 131.674306 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 45.000000 \n", "75% 228.000000 \n", "max 365.000000 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_train.describe()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 208 }, "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661730, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "znZJ-DLKVEa8", "outputId": "cc18e292-492f-4f3f-8dc6-b7ec25798b72" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  id host_id latitude longitude minimum_nights number_of_reviews reviews_per_month calculated_host_listings_count availability_365
0 14963583 94219511 40.80167 -73.95781 3 3 0.09 2 0
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 14963583 94219511 40.80167 -73.95781 3 3 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.09 2 0 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  id host_id latitude longitude minimum_nights number_of_reviews reviews_per_month calculated_host_listings_count availability_365
0 14963583 0.343458 0.731742 0.539318 0.001601 0.004769 0.001538 0.003067 0.0
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 14963583 0.343458 0.731742 0.539318 0.001601 0.004769 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.001538 0.003067 0.0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 수치형 변수\n", "from sklearn.preprocessing import MinMaxScaler\n", "scaler = MinMaxScaler()\n", "cols = [\n", " 'host_id',\n", " 'latitude',\n", " 'longitude',\n", " 'minimum_nights',\n", " 'number_of_reviews', \n", " 'reviews_per_month',\n", " 'calculated_host_listings_count',\n", " 'availability_365'\n", " ]\n", "\n", "display(n_train.head(1))\n", "n_train[cols] = scaler.fit_transform(n_train[cols])\n", "n_test[cols] = scaler.transform(n_test[cols])\n", "display(n_train.head(1))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661731, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "teSF1NETVEdj" }, "outputs": [], "source": [ "n_train = n_train.drop('id', axis=1)\n", "n_test = n_test.drop('id', axis=1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 125 }, "executionInfo": { "elapsed": 36, "status": "ok", "timestamp": 1654407661731, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "7ZVqobuHYENc", "outputId": "87a2a2b4-059a-457a-fae7-4103841850f3" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  name host_name neighbourhood_group neighbourhood room_type
0 Room in South Harlem near Central Park Gilles Manhattan Harlem Private room
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles Manhattan \n", "\n", " neighbourhood room_type \n", "0 Harlem Private room " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_train.head(1)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 514, "status": "ok", "timestamp": 1654407662210, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "3maaNowPUTkr", "outputId": "617cb500-9e80-40a8-ffb7-b40d41f76ce5" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  name host_name neighbourhood_group neighbourhood room_type
count 39104 39099 39116 39116 39116
unique 38420 9977 5 221 3
top Home away from home Michael Manhattan Williamsburg Entire home/apt
freq 15 338 17331 3099 20299
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group neighbourhood \\\n", "count 39104 39099 39116 39116 \n", "unique 38420 9977 5 221 \n", "top Home away from home Michael Manhattan Williamsburg \n", "freq 15 338 17331 3099 \n", "\n", " room_type \n", "count 39116 \n", "unique 3 \n", "top Entire home/apt \n", "freq 20299 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_train.describe()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 188 }, "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1654407662210, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "m3HHLxc-VEfm", "outputId": "a5dae84a-9bc9-4ccc-bed1-89f967e52b75" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  name host_name neighbourhood_group neighbourhood room_type
0 Room in South Harlem near Central Park Gilles Manhattan Harlem Private room
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles Manhattan \n", "\n", " neighbourhood room_type \n", "0 Harlem Private room " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  name host_name neighbourhood_group neighbourhood room_type
0 Room in South Harlem near Central Park Gilles 2 94 1
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles 2 \n", "\n", " neighbourhood room_type \n", "0 94 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 범주형 변수\n", "from sklearn.preprocessing import LabelEncoder\n", "le = LabelEncoder()\n", "cols = [\n", " 'neighbourhood_group',\n", " 'neighbourhood',\n", " 'room_type'\n", " ]\n", "\n", "display(c_train.head(1))\n", "for col in cols:\n", " c_train[col] = le.fit_transform(c_train[col])\n", " c_test[col] = le.transform(c_test[col])\n", "\n", "display(c_train.head(1))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1654407662211, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "rO_pPlqXW7mT" }, "outputs": [], "source": [ "del_cols =['name','host_name']\n", "c_train = c_train.drop(del_cols, axis=1)\n", "c_test = c_test.drop(del_cols, axis=1)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 288 }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654407678323, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "pjU9by2YVEh9", "outputId": "bfa46c6a-951d-4f6e-b36b-5d1716d3ef87" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(39116, 11) (9779, 11)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
  host_id latitude longitude minimum_nights number_of_reviews reviews_per_month calculated_host_listings_count availability_365 neighbourhood_group neighbourhood room_type
0 0.343458 0.731742 0.539318 0.001601 0.004769 0.001538 0.003067 0.000000 2 94 1
1 0.178671 0.631633 0.478445 0.000801 0.101749 0.028718 0.003067 0.717808 2 95 0
2 0.001595 0.558041 0.449354 0.047238 0.001590 0.003419 0.000000 0.000000 2 209 0
3 0.013033 0.464162 0.579361 0.002402 0.379968 0.049402 0.003067 0.002740 1 13 0
4 0.045468 0.458611 0.543571 0.021617 0.000000 0.000000 0.000000 0.000000 1 13 1
\n", "
" ], "text/plain": [ " host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 0.343458 0.731742 0.539318 0.001601 0.004769 \n", "1 0.178671 0.631633 0.478445 0.000801 0.101749 \n", "2 0.001595 0.558041 0.449354 0.047238 0.001590 \n", "3 0.013033 0.464162 0.579361 0.002402 0.379968 \n", "4 0.045468 0.458611 0.543571 0.021617 0.000000 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \\\n", "0 0.001538 0.003067 0.000000 \n", "1 0.028718 0.003067 0.717808 \n", "2 0.003419 0.000000 0.000000 \n", "3 0.049402 0.003067 0.002740 \n", "4 0.000000 0.000000 0.000000 \n", "\n", " neighbourhood_group neighbourhood room_type \n", "0 2 94 1 \n", "1 2 95 0 \n", "2 2 209 0 \n", "3 1 13 0 \n", "4 1 13 1 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 분리한 데이터 다시 합침\n", "train = pd.concat([n_train, c_train], axis=1)\n", "test = pd.concat([n_test, c_test], axis=1)\n", "print(train.shape, test.shape)\n", "train.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "CJ_v5LsbBJDz" }, "source": [ "## 검증 데이터 분리" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 257, "status": "ok", "timestamp": 1654407696859, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "IHaYuF40BLJY", "outputId": "5888e9f3-2878-4c63-bf29-6bea194a5bf0" }, "outputs": [ { "data": { "text/plain": [ "((31292, 11), (7824, 11), (31292,), (7824,))" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 검증 데이터 분리\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_val, y_train, y_val = train_test_split(train, target, test_size=0.2, random_state=2022)\n", "X_train.shape, X_val.shape, y_train.shape, y_val.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "0vuCvtU_B4Nb" }, "source": [ "## 머신러닝" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 45082, "status": "ok", "timestamp": 1654408150815, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "rJ8ECneaG3dc", "outputId": "a12c41e2-4a17-4610-c4fc-30ce62d556b8" }, "outputs": [ { "data": { "text/plain": [ "25325.322081149014" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 랜덤포레스트\n", "from sklearn.ensemble import RandomForestRegressor\n", "\n", "model = RandomForestRegressor(random_state=2022, n_estimators=200)\n", "model.fit(X_train, y_train)\n", "pred = model.predict(X_val)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_val, pred)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 34774, "status": "ok", "timestamp": 1654408323848, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "yqEzdSnSG3gV", "outputId": "c8ca8722-e919-4571-8388-620bef5a8f0e" }, "outputs": [ { "data": { "text/plain": [ "25027.49844781765" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Xgboost\n", "from xgboost import XGBRegressor\n", "model = XGBRegressor(max_depth=10,\n", " learning_rate=0.02,\n", " n_estimators=500,\n", " random_state=2022)\n", "\n", "model.fit(X_train, y_train)\n", "pred = model.predict(X_val)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_val, pred)" ] }, { "cell_type": "markdown", "metadata": { "id": "OVVOOGAkIb8R" }, "source": [ "## 채점" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 1105, "status": "ok", "timestamp": 1654408331733, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "62aQrZeyG3nb", "outputId": "f73e32d4-4d2e-409d-fa87-a14f3f02a4ad" }, "outputs": [ { "data": { "text/plain": [ "42778.2854814971" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# test 데이터 예측 및 평가\n", "y_test = pd.read_csv('y_test.csv')\n", "\n", "# Xgboost\n", "pred = model.predict(test)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_test, pred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654407722125, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "cgJYGvYAPwnL" }, "outputs": [], "source": [] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyMwWhKBT8hxADeG/1F1qBkW", "name": "머신러닝_기초_노드10_프로젝트(성능향상).ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 1 }