7. 성능향상 Tip
baseline score보다 점수가 낮게 나오도록 프로젝트를 도전해 보셨나요?
성능 향상을 위한 TIP코드를 준비했으니 아래 준비한 파일을 다운 받아 LMS에 직접 입력하거나 코랩, 주피터 노트북 등에서 활용해 보세요.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "ySTMVUAR458d"
},
"source": [
"# 머신러닝 프로젝트"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KWE9ZloU48sf"
},
"source": [
"## Airbnb (New York City)\n",
"- 미국 NYC Airbnb 목록(2019)\n",
"- 데이터 출처:https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data (License CC0: Public Domain)\n",
"- 프로젝트 목적: 가격 예측(price)\n",
"- 제공 데이터(3개): train.csv, test.csv, y_test(최종 채점용)\n",
"- 평가 방식: MSE\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "swTGNLoBFaS6"
},
"source": [
"# 성능향상 Tip"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"executionInfo": {
"elapsed": 274,
"status": "ok",
"timestamp": 1654407659997,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "UHaAsvYa9jAX"
},
"outputs": [],
"source": [
"# 라이브러리 \n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"executionInfo": {
"elapsed": 418,
"status": "ok",
"timestamp": 1654407660942,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "b8ar8Ohk_h4Z"
},
"outputs": [],
"source": [
"# 데이터 불러오기\n",
"train = pd.read_csv('train.csv')\n",
"test = pd.read_csv('test.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1BPuoeckATA3"
},
"source": [
"## EDA"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 21,
"status": "ok",
"timestamp": 1654407660942,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "3URb2ddyAHMc",
"outputId": "10dc3207-23ca-4479-fced-6044e844d28e"
},
"outputs": [
{
"data": {
"text/plain": [
"((39116, 16), (9779, 15))"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 데이터 크기\n",
"train.shape, test.shape"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 531
},
"executionInfo": {
"elapsed": 20,
"status": "ok",
"timestamp": 1654407660943,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "BwkRFT7oART_",
"outputId": "4781f6ff-d9b7-476c-aad4-a771ccaccae9"
},
"outputs": [
{
"data": {
"text/html": [
"
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
name |
host_id |
host_name |
neighbourhood_group |
neighbourhood |
latitude |
longitude |
room_type |
price |
minimum_nights |
number_of_reviews |
last_review |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
14963583 |
Room in South Harlem near Central Park |
94219511 |
Gilles |
Manhattan |
Harlem |
40.80167 |
-73.95781 |
Private room |
70 |
3 |
3 |
2019-01-01 |
0.09 |
2 |
0 |
1 |
9458704 |
Large 1BR Apartment, near Times Sq (2nd Floor) |
49015331 |
Iradj |
Manhattan |
Hell's Kitchen |
40.76037 |
-73.99016 |
Entire home/apt |
240 |
2 |
64 |
2019-06-30 |
1.68 |
2 |
262 |
\n", "
" ], "text/plain": [ " id name host_id \\\n", "0 14963583 Room in South Harlem near Central Park 94219511 \n", "1 9458704 Large 1BR Apartment, near Times Sq (2nd Floor) 49015331 \n", "\n", " host_name neighbourhood_group neighbourhood latitude longitude \\\n", "0 Gilles Manhattan Harlem 40.80167 -73.95781 \n", "1 Iradj Manhattan Hell's Kitchen 40.76037 -73.99016 \n", "\n", " room_type price minimum_nights number_of_reviews last_review \\\n", "0 Private room 70 3 3 2019-01-01 \n", "1 Entire home/apt 240 2 64 2019-06-30 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.09 2 0 \n", "1 1.68 2 262 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
name |
host_id |
host_name |
neighbourhood_group |
neighbourhood |
latitude |
longitude |
room_type |
minimum_nights |
number_of_reviews |
last_review |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
30913224 |
Cozy and Sunny Room Williamsburg, Luxury Building |
33771081 |
Rémy |
Brooklyn |
Williamsburg |
40.70959 |
-73.94652 |
Private room |
3 |
2 |
2019-05-08 |
0.31 |
1 |
0 |
1 |
971247 |
Sunny Artist Live/Work Apartment |
5308961 |
Larry |
Manhattan |
Upper West Side |
40.79368 |
-73.96487 |
Entire home/apt |
3 |
159 |
2019-07-03 |
2.09 |
1 |
244 |
\n", "
" ], "text/plain": [ " id name host_id \\\n", "0 30913224 Cozy and Sunny Room Williamsburg, Luxury Building 33771081 \n", "1 971247 Sunny Artist Live/Work Apartment 5308961 \n", "\n", " host_name neighbourhood_group neighbourhood latitude longitude \\\n", "0 Rémy Brooklyn Williamsburg 40.70959 -73.94652 \n", "1 Larry Manhattan Upper West Side 40.79368 -73.96487 \n", "\n", " room_type minimum_nights number_of_reviews last_review \\\n", "0 Private room 3 2 2019-05-08 \n", "1 Entire home/apt 3 159 2019-07-03 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.31 1 0 \n", "1 2.09 1 244 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 데이터 샘플\n", "display(train.head(2))\n", "display(test.head(2))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 283 }, "executionInfo": { "elapsed": 432, "status": "ok", "timestamp": 1654407661366, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "g4bdXDwhApJX", "outputId": "d16ead8b-d6ca-487f-bb9c-592bc144242a" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAD4CAYAAADsKpHdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWVUlEQVR4nO3df7DddZ3f8eerycJSViWIvRMTtonT6AxKi3IHcNx1bpcVAu6IdhwbhlmiUqNVZtaWmW2oncFqmcGtrC3U4sY1FXeyICtqMohlI/XWdqYgYaUkINlcMJZkAlmBlV7docZ994/zuXi8uTe5nHN/5N7zfMx853y/7++P832fb8gr3x/nkKpCkqS/s9A7IEk6MRgIkiTAQJAkNQaCJAkwECRJzfKF3oFenXHGGbVmzZqe1v3JT37CqaeeOrs7dIKz58Fgz0tfv/0++OCDP6qqV001b9EGwpo1a9i1a1dP646OjjIyMjK7O3SCs+fBYM9LX7/9JvnhdPO8ZCRJAgwESVJjIEiSAANBktQYCJIkYAaBkGRrksNJ9nTVvpzkoTbsT/JQq69J8jdd8z7Xtc65SXYnGUtyU5K0+ulJdibZ115XzEGfkqTjmMkZwheB9d2FqvqnVXVOVZ0D3Al8tWv24xPzqupDXfVbgA8A69owsc3NwL1VtQ64t01LkubZcQOhqr4DPDvVvPav/PcAtx1rG0lWAi+vqvuq83vbXwLe2WZfBtzaxm/tqkuS5lG/9xB+E3i6qvZ11dYm+V6S/57kN1ttFXCga5kDrQYwVFWH2vhTwFCf+yRJ6kG/31S+nF8+OzgE/HpVPZPkXODrSV4/041VVSWZ9v/Yk2QTsAlgaGiI0dHRnnb68LM/5uZt2wE4e9UretrGYjM+Pt7z57VY2fNgGLSe57LfngMhyXLgnwDnTtSq6gXghTb+YJLHgdcCB4HVXauvbjWAp5OsrKpD7dLS4enes6q2AFsAhoeHq9evb9+8bTs37u60vv+K3rax2Aza1/vBngfFoPU8l/32c8not4HHqurFS0FJXpVkWRt/DZ2bx0+0S0LPJ7mg3Xe4EtjeVtsBbGzjG7vqkqR5NJPHTm8D/hfwuiQHklzVZm3g6JvJbwUebo+hfgX4UFVN3JD+MPDHwBjwOPDNVr8BeFuSfXRC5obe25Ek9eq4l4yq6vJp6u+donYnncdQp1p+F/CGKerPABcebz8kSXPLbypLkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAmYQSAk2ZrkcJI9XbWPJzmY5KE2XNo179okY0n2Jrm4q76+1caSbO6qr01yf6t/OclJs9mgJGlmZnKG8EVg/RT1z1TVOW24GyDJWcAG4PVtnf+cZFmSZcBngUuAs4DL27IAn2rb+gfAc8BV/TQkSerNcQOhqr4DPDvD7V0G3F5VL1TVD4Ax4Lw2jFXVE1X1/4DbgcuSBPgt4Ctt/VuBd760FiRJs2F5H+teneRKYBdwTVU9B6wC7uta5kCrATw5qX4+8Ergr6vqyBTLHyXJJmATwNDQEKOjoz3t+NApcM3ZnbfsdRuLzfj4+MD0OsGeB8Og9TyX/fYaCLcAnwSqvd4IvH+2dmo6VbUF2AIwPDxcIyMjPW3n5m3buXF3p/X9V/S2jcVmdHSUXj+vxcqeB8Og9TyX/fYUCFX19MR4ks8Dd7XJg8CZXYuubjWmqT8DnJZkeTtL6F5ekjSPenrsNMnKrsl3ARNPIO0ANiQ5OclaYB3wXeABYF17ougkOjeed1RVAd8G3t3W3whs72WfJEn9Oe4ZQpLbgBHgjCQHgOuAkSTn0LlktB/4IEBVPZLkDuBR4Ajwkar6edvO1cA9wDJga1U90t7iXwG3J/l3wPeAL8xWc5KkmTtuIFTV5VOUp/1Lu6quB66fon43cPcU9SfoPIUkSVpAflNZkgQYCJKkxkCQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBPT389dLwprN33hxfP8Nb1/APZGkheUZgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQJmEAhJtiY5nGRPV+3fJ3ksycNJvpbktFZfk+RvkjzUhs91rXNukt1JxpLclCStfnqSnUn2tdcVc9CnJOk4ZnKG8EVg/aTaTuANVfUPgb8Eru2a93hVndOGD3XVbwE+AKxrw8Q2NwP3VtU64N42LUmaZ8cNhKr6DvDspNqfV9WRNnkfsPpY20iyEnh5Vd1XVQV8CXhnm30ZcGsbv7WrLkmaR7Px89fvB77cNb02yfeA54F/U1X/A1gFHOha5kCrAQxV1aE2/hQwNN0bJdkEbAIYGhpidHS0px0eOgWuOfvIUfVet7cYjI+PL+n+pmLPg2HQep7LfvsKhCQfA44A21rpEPDrVfVMknOBryd5/Uy3V1WVpI4xfwuwBWB4eLhGRkZ62u+bt23nxt1Ht77/it62txiMjo7S6+e1WNnzYBi0nuey354DIcl7gd8BLmyXgaiqF4AX2viDSR4HXgsc5JcvK61uNYCnk6ysqkPt0tLhXvdJktS7nh47TbIe+H3gHVX10676q5Isa+OvoXPz+Il2Sej5JBe0p4uuBLa31XYAG9v4xq66JGkeHfcMIcltwAhwRpIDwHV0nio6GdjZnh69rz1R9FbgE0l+Bvwt8KGqmrgh/WE6TyydAnyzDQA3AHckuQr4IfCeWelMkvSSHDcQquryKcpfmGbZO4E7p5m3C3jDFPVngAuPtx+SpLnlN5UlSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJagwESRJgIEiSmhkFQpKtSQ4n2dNVOz3JziT72uuKVk+Sm5KMJXk4yZu61tnYlt+XZGNX/dwku9s6NyXJbDYpSTq+mZ4hfBFYP6m2Gbi3qtYB97ZpgEuAdW3YBNwCnQABrgPOB84DrpsIkbbMB7rWm/xekqQ5NqNAqKrvAM9OKl8G3NrGbwXe2VX/UnXcB5yWZCVwMbCzqp6tqueAncD6Nu/lVXVfVRXwpa5tSZLmyfI+1h2qqkNt/ClgqI2vAp7sWu5Aqx2rfmCK+lGSbKJz1sHQ0BCjo6O97fgpcM3ZR46q97q9xWB8fHxJ9zcVex4Mg9bzXPbbTyC8qKoqSc3Gto7zPluALQDDw8M1MjLS03Zu3radG3cf3fr+K3rb3mIwOjpKr5/XYmXPg2HQep7Lfvt5yujpdrmH9nq41Q8CZ3Ytt7rVjlVfPUVdkjSP+gmEHcDEk0Ibge1d9Svb00YXAD9ul5buAS5KsqLdTL4IuKfNez7JBe3poiu7tiVJmiczumSU5DZgBDgjyQE6TwvdANyR5Crgh8B72uJ3A5cCY8BPgfcBVNWzST4JPNCW+0RVTdyo/jCdJ5lOAb7ZBknSPJpRIFTV5dPMunCKZQv4yDTb2QpsnaK+C3jDTPZFkjQ3/KayJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJKAPgIhyeuSPNQ1PJ/ko0k+nuRgV/3SrnWuTTKWZG+Si7vq61ttLMnmfpuSJL10y3tdsar2AucAJFkGHAS+BrwP+ExVfbp7+SRnARuA1wOvBr6V5LVt9meBtwEHgAeS7KiqR3vdN0nSS9dzIExyIfB4Vf0wyXTLXAbcXlUvAD9IMgac1+aNVdUTAElub8saCJI0j2YrEDYAt3VNX53kSmAXcE1VPQesAu7rWuZAqwE8Oal+/lRvkmQTsAlgaGiI0dHRnnZ26BS45uwjR9V73d5iMD4+vqT7m4o9D4ZB63ku++07EJKcBLwDuLaVbgE+CVR7vRF4f7/vA1BVW4AtAMPDwzUyMtLTdm7etp0bdx/d+v4retveYjA6Okqvn9diZc+DYdB6nst+Z+MM4RLgL6rqaYCJV4AknwfuapMHgTO71lvdahyjLkmaJ7Px2OnldF0uSrKya967gD1tfAewIcnJSdYC64DvAg8A65KsbWcbG9qykqR51NcZQpJT6Twd9MGu8h8kOYfOJaP9E/Oq6pEkd9C5WXwE+EhV/bxt52rgHmAZsLWqHulnvyRJL11fgVBVPwFeOan2u8dY/nrg+inqdwN397MvkqT++E1lSRJgIEiSGgNBkgQYCJKkxkCQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpKbvQEiyP8nuJA8l2dVqpyfZmWRfe13R6klyU5KxJA8neVPXdja25fcl2djvfkmSXprZOkP4x1V1TlUNt+nNwL1VtQ64t00DXAKsa8Mm4BboBAhwHXA+cB5w3USISJLmx1xdMroMuLWN3wq8s6v+peq4DzgtyUrgYmBnVT1bVc8BO4H1c7RvkqQpzEYgFPDnSR5MsqnVhqrqUBt/Chhq46uAJ7vWPdBq09UlSfNk+Sxs4zeq6mCSvwfsTPJY98yqqiQ1C+9DC5xNAENDQ4yOjva0naFT4JqzjxxV73V7i8H4+PiS7m8q9jwYBq3nuey370CoqoPt9XCSr9G5B/B0kpVVdahdEjrcFj8InNm1+upWOwiMTKqPTvFeW4AtAMPDwzUyMjJ5kRm5edt2btx9dOv7r+hte4vB6OgovX5ei5U9D4ZB63ku++3rklGSU5O8bGIcuAjYA+wAJp4U2ghsb+M7gCvb00YXAD9ul5buAS5KsqLdTL6o1SRJ86TfM4Qh4GtJJrb1p1X1X5M8ANyR5Crgh8B72vJ3A5cCY8BPgfcBVNWzST4JPNCW+0RVPdvnvr1kazZ/48Xx/Te8fb7fXpIWVF+BUFVPAP9oivozwIVT1Av4yDTb2gps7Wd/JEm985vKkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkC+giEJGcm+XaSR5M8kuT3Wv3jSQ4meagNl3atc22SsSR7k1zcVV/famNJNvfXkiSpF8v7WPcIcE1V/UWSlwEPJtnZ5n2mqj7dvXCSs4ANwOuBVwPfSvLaNvuzwNuAA8ADSXZU1aN97Jsk6SXqORCq6hBwqI3/3yTfB1YdY5XLgNur6gXgB0nGgPPavLGqegIgye1tWQNBkuZRP2cIL0qyBngjcD/wFuDqJFcCu+icRTxHJyzu61rtAL8IkCcn1c+f5n02AZsAhoaGGB0d7Wl/h06Ba84+csxlet32iWp8fHzJ9XQ89jwYBq3nuey370BI8mvAncBHq+r5JLcAnwSqvd4IvL/f9wGoqi3AFoDh4eEaGRnpaTs3b9vOjbuP3fr+K3rb9olqdHSUXj+vxcqeB8Og9TyX/fYVCEl+hU4YbKuqrwJU1dNd8z8P3NUmDwJndq2+utU4Rl2SNE/6ecoowBeA71fVH3bVV3Yt9i5gTxvfAWxIcnKStcA64LvAA8C6JGuTnETnxvOOXvdLktSbfs4Q3gL8LrA7yUOt9q+By5OcQ+eS0X7ggwBV9UiSO+jcLD4CfKSqfg6Q5GrgHmAZsLWqHuljvyRJPejnKaP/CWSKWXcfY53rgeunqN99rPUkSXPPbypLkgADQZLUGAiSJMBAkCQ1BoIkCZiln65YitZs/saL4/tvePsC7okkzQ/PECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJavzpihnwZywkDQLPECRJgIEgSWoMBEkS4D2El8z7CZKWqhPmDCHJ+iR7k4wl2bzQ+yNJg+aEOENIsgz4LPA24ADwQJIdVfXowu7ZsXWfLXTzzEHSYnRCBAJwHjBWVU8AJLkduAw4oQNhOtMFxWQGh6QTyYkSCKuAJ7umDwDnT14oySZgU5scT7K3x/c7A/hRj+vOmnxqXt/uhOh5ntnzYBi0nvvt9+9PN+NECYQZqaotwJZ+t5NkV1UNz8IuLRr2PBjseemby35PlJvKB4Ezu6ZXt5okaZ6cKIHwALAuydokJwEbgB0LvE+SNFBOiEtGVXUkydXAPcAyYGtVPTKHb9n3ZadFyJ4Hgz0vfXPWb6pqrrYtSVpETpRLRpKkBWYgSJKAAQyEpfITGUnOTPLtJI8meSTJ77X66Ul2JtnXXle0epLc1Pp+OMmbura1sS2/L8nGheppppIsS/K9JHe16bVJ7m+9fbk9mECSk9v0WJu/pmsb17b63iQXL1ArM5LktCRfSfJYku8nefNSP85J/kX7c70nyW1JfnWpHeckW5McTrKnqzZrxzXJuUl2t3VuSpLj7lRVDcxA54b148BrgJOA/w2ctdD71WMvK4E3tfGXAX8JnAX8AbC51TcDn2rjlwLfBAJcANzf6qcDT7TXFW18xUL3d5ze/yXwp8BdbfoOYEMb/xzwz9v4h4HPtfENwJfb+Fnt2J8MrG1/JpYtdF/H6PdW4J+18ZOA05bycabzRdUfAKd0Hd/3LrXjDLwVeBOwp6s2a8cV+G5bNm3dS467Twv9oczzAXgzcE/X9LXAtQu9X7PU23Y6vwW1F1jZaiuBvW38j4DLu5bf2+ZfDvxRV/2XljvRBjrfUbkX+C3grvaH/UfA8snHmM5Ta29u48vbcpl83LuXO9EG4BXtL8dMqi/Z48wvfrng9Hbc7gIuXorHGVgzKRBm5bi2eY911X9puemGQbtkNNVPZKxaoH2ZNe0U+Y3A/cBQVR1qs54Chtr4dL0vts/kPwC/D/xtm34l8NdVdaRNd+//i721+T9uyy+mntcCfwX8l3aZ7I+TnMoSPs5VdRD4NPB/gEN0jtuDLO3jPGG2juuqNj65fkyDFghLTpJfA+4EPlpVz3fPq84/DZbMc8VJfgc4XFUPLvS+zKPldC4r3FJVbwR+QudSwouW4HFeQefHLdcCrwZOBdYv6E4tgIU4roMWCEvqJzKS/AqdMNhWVV9t5aeTrGzzVwKHW3263hfTZ/IW4B1J9gO307ls9B+B05JMfMmye/9f7K3NfwXwDIur5wPAgaq6v01/hU5ALOXj/NvAD6rqr6rqZ8BX6Rz7pXycJ8zWcT3YxifXj2nQAmHJ/ERGe2LgC8D3q+oPu2btACaeNNhI597CRP3K9rTCBcCP26npPcBFSVa0f5ld1GonnKq6tqpWV9UaOsfuv1XVFcC3gXe3xSb3PPFZvLstX62+oT2dshZYR+cG3Amnqp4Cnkzyula6kM7Pwi/Z40znUtEFSf5u+3M+0fOSPc5dZuW4tnnPJ7mgfYZXdm1regt9U2UBbuJcSueJnMeBjy30/vTRx2/QOZ18GHioDZfSuXZ6L7AP+BZwels+dP4nRI8Du4Hhrm29Hxhrw/sWurcZ9j/CL54yeg2d/9DHgD8DTm71X23TY23+a7rW/1j7LPYyg6cvFrjXc4Bd7Vh/nc7TJEv6OAP/FngM2AP8CZ0nhZbUcQZuo3OP5Gd0zgSvms3jCgy3z+9x4D8x6cGEqQZ/ukKSBAzeJSNJ0jQMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqfn/rab4s2/vk8AAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# target(hist)\n", "train['price'].hist(bins=100)" ] }, { "cell_type": "markdown", "metadata": { "id": "0DlUNAusB8Qr" }, "source": [ "## 데이터 전처리" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 11, "status": "ok", "timestamp": 1654407661368, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "4e-J3bK4QoXU", "outputId": "82d2df35-b8fc-4898-885e-48f3c9891467" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 70\n", "1 240\n", "2 150\n", "Name: price, dtype: int64\n", "0 4.262680\n", "1 5.484797\n", "2 5.017280\n", "Name: price, dtype: float64\n", "0 70.0\n", "1 240.0\n", "2 150.0\n", "Name: price, dtype: float64\n" ] } ], "source": [ "import numpy as np\n", "print(train['price'][:3])\n", "print(np.log1p(train['price'])[:3])\n", "print(np.expm1(np.log1p(train['price'])[:3]))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 283 }, "executionInfo": { "elapsed": 363, "status": "ok", "timestamp": 1654407661722, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "GIli-YOSRZlI", "outputId": "1b30cedb-4225-4d16-c808-149d35e6289f" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAASLElEQVR4nO3df4xlZX3H8fenoFWZhsXQTuiy6fLH1gYhUpgArU0zWyou0BSbNAZCEfyR9Q9otSWpq4nB+CPZP9S2Rku6la0YrROiGDewLd1unRj/QGGVuPzQsMHVMt1CLbi4aGqx3/5xz+J1ndmZuTN77+x93q9kcs99znPOfc6Tez/3uec+90yqCklSG35h1A2QJA2PoS9JDTH0Jakhhr4kNcTQl6SGnDrqBhzPmWeeWRs3bhx4++eee47TTjtt9Rp0krIfeuyHHvuhZ5z7Yd++fd+rql+eb92aDv2NGzfywAMPDLz97Ows09PTq9egk5T90GM/9NgPPePcD0m+s9A6T+9IUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JD1vQvcqVR2bjtnheWD26/aoQtkVaXI31JaoihL0kN8fSOtMo8NaS1zJG+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNWTT0k2xI8sUkjyR5OMnbuvL3JJlL8mD3d2XfNu9MciDJt5K8tq98S1d2IMm2E3NIkqSFLOXHWc8Dt1TV15L8ErAvyZ5u3V9V1Qf7Kyc5F7gGeCXwq8C/Jvn1bvXHgNcATwD3J9lVVY+sxoFIkha3aOhX1SHgULf8gySPAuuPs8nVwExV/Q/w7SQHgIu7dQeq6nGAJDNdXUNfkoYkVbX0yslG4EvAecBfADcCzwIP0Ps08EySjwL3VdWnum1uB/6p28WWqnpLV349cElV3XzMY2wFtgJMTk5eNDMzM/DBHTlyhImJiYG3Hxf2Q89y+mH/3OEXls9ff/qyHmcl2w6Dz4eece6HzZs376uqqfnWLfnaO0kmgM8Bb6+qZ5PcBrwPqO72Q8CbVtrYqtoB7ACYmpqq6enpgfc1OzvLSrYfF/ZDz3L64cb+6+dct7RtVmPbYfD50NNqPywp9JO8iF7gf7qq7gKoqif71v89cHd3dw7Y0Lf52V0ZxymXJA3BUmbvBLgdeLSqPtxXflZftT8CHuqWdwHXJPnFJOcAm4CvAvcDm5Kck+TF9L7s3bU6hyFJWoqljPRfDVwP7E/yYFf2LuDaJBfQO71zEHgrQFU9nOROel/QPg/cVFU/AUhyM3AvcAqws6oeXrUjkSQtaimzd74MZJ5Vu4+zzQeAD8xTvvt420njzOvsay3wF7mS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWrIov8YXdL8/EfnOhk50pekhjjSlxbhiF7jxJG+JDXEkb60hvipQieaI31JaoihL0kNMfQlqSGGviQ1xNCXpIYsGvpJNiT5YpJHkjyc5G1d+cuT7EnyWHd7RleeJB9JciDJN5Jc2LevG7r6jyW54cQdliRpPksZ6T8P3FJV5wKXAjclORfYBuytqk3A3u4+wBXApu5vK3Ab9N4kgFuBS4CLgVuPvlFIkoZj0dCvqkNV9bVu+QfAo8B64Grgjq7aHcDruuWrgU9Wz33AuiRnAa8F9lTV01X1DLAH2LKaByNJOr5U1dIrJxuBLwHnAd+tqnVdeYBnqmpdkruB7VX15W7dXuAdwDTwkqp6f1f+buBHVfXBYx5jK71PCExOTl40MzMz8MEdOXKEiYmJgbcfF/ZDz3L6Yf/c4XnLz19/+rx1VlK+0OMuVGelfD70jHM/bN68eV9VTc23bsm/yE0yAXwOeHtVPdvL+Z6qqiRLf/c4jqraAewAmJqaqunp6YH3NTs7y0q2Hxf2Q89y+uHGvl/G9jt43fS8dVZSvtDjLlRnpXw+9LTaD0uavZPkRfQC/9NVdVdX/GR32obu9qmufA7Y0Lf52V3ZQuWSpCFZyuydALcDj1bVh/tW7QKOzsC5AfhCX/kbulk8lwKHq+oQcC9weZIzui9wL+/KJElDspTTO68Grgf2J3mwK3sXsB24M8mbge8Ar+/W7QauBA4APwTeCFBVTyd5H3B/V++9VfX0ahyEJGlpFg397gvZLLD6snnqF3DTAvvaCexcTgMlSavHX+RKUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0JekhviP0TW2/Cfj0s9zpC9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5Ia4pRNNcepnGqZI31JaoihL0kNMfQlqSGGviQ1xNCXpIY4e0casf7ZRNKJ5khfkhpi6EtSQwx9SWqIoS9JDTH0Jakhzt6RlsGZNjrZOdKXpIYY+pLUEENfkhqyaOgn2ZnkqSQP9ZW9J8lckge7vyv71r0zyYEk30ry2r7yLV3ZgSTbVv9QJEmLWcpI/xPAlnnK/6qqLuj+dgMkORe4Bnhlt83fJjklySnAx4ArgHOBa7u6kqQhWnT2TlV9KcnGJe7vamCmqv4H+HaSA8DF3boDVfU4QJKZru4jy2+yJGlQqarFK/VC/+6qOq+7/x7gRuBZ4AHglqp6JslHgfuq6lNdvduBf+p2s6Wq3tKVXw9cUlU3z/NYW4GtAJOTkxfNzMwMfHBHjhxhYmJi4O3HRav9sH/u8AvL568//YV+OLZ8vvrLtdB+llu+0D5XU6vPh2ONcz9s3rx5X1VNzbdu0Hn6twHvA6q7/RDwpgH39TOqagewA2Bqaqqmp6cH3tfs7Cwr2X5ctNoPN/b/L9zrpl/oh2PL56u/bPuf67vz05fVQvtfyuP211lNrT4fjtVqPwwU+lX15NHlJH8P3N3dnQM29FU9uyvjOOWSpCEZaMpmkrP67v4RcHRmzy7gmiS/mOQcYBPwVeB+YFOSc5K8mN6XvbsGb7YkaRCLjvSTfAaYBs5M8gRwKzCd5AJ6p3cOAm8FqKqHk9xJ7wva54Gbquon3X5uBu4FTgF2VtXDq30w0rjqv/zDwe1XjbAlOtktZfbOtfMU336c+h8APjBP+W5g97JaJ0laVf4iV5IaYuhLUkO8tLKa5qWS1RpH+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcR5+tII+PsAjYojfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1Jaoi/yJVOIH95q7XG0JdOYv1vKge3XzXCluhk4ekdSWqIoS9JDTH0Jakhhr6asHHbPeyfO+wXq2qeoS9JDTH0Jakhhr4kNcTQl6SGLBr6SXYmeSrJQ31lL0+yJ8lj3e0ZXXmSfCTJgSTfSHJh3zY3dPUfS3LDiTkcSdLxLGWk/wlgyzFl24C9VbUJ2NvdB7gC2NT9bQVug96bBHArcAlwMXDr0TcKSdLwLBr6VfUl4Oljiq8G7uiW7wBe11f+yeq5D1iX5CzgtcCeqnq6qp4B9vDzbySSpBMsVbV4pWQjcHdVndfd/35VreuWAzxTVeuS3A1sr6ovd+v2Au8ApoGXVNX7u/J3Az+qqg/O81hb6X1KYHJy8qKZmZmBD+7IkSNMTEwMvP24GOd+2D93+IXl89efvuA6gMmXwpM/GkqzVkX/8Sx0nMc7/oWM8/NhOca5HzZv3ryvqqbmW7fiC65VVSVZ/J1j6fvbAewAmJqaqunp6YH3NTs7y0q2Hxfj3A839l9w7LrpBdcB3HL+83xo/0l0jcH9z/Xd+Wm7+4/zeMe/kHF+PixHq/0w6CvgySRnVdWh7vTNU135HLChr97ZXdkcvdF+f/nsgI8tzctf20qLG3TK5i7g6AycG4Av9JW/oZvFcylwuKoOAfcClyc5o/sC9/KuTJI0RIuO9JN8ht4o/cwkT9CbhbMduDPJm4HvAK/vqu8GrgQOAD8E3ghQVU8neR9wf1fvvVV17JfDkqQTbNHQr6prF1h12Tx1C7hpgf3sBHYuq3WSpFXlL3IlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhqyotBPcjDJ/iQPJnmgK3t5kj1JHutuz+jKk+QjSQ4k+UaSC1fjACRJS7caI/3NVXVBVU1197cBe6tqE7C3uw9wBbCp+9sK3LYKjy1JWoZTT8A+rwamu+U7gFngHV35J6uqgPuSrEtyVlUdOgFt0JjZuO2eF5YPbr9qhC2RTm7pZfCAGyffBp4BCvi7qtqR5PtVta5bH+CZqlqX5G5ge1V9uVu3F3hHVT1wzD630vskwOTk5EUzMzMDt+/IkSNMTEwMvP24GId+2D93+IXl89efPm/5YiZfCk/+aFWbNRILHX9/+fGMw/NhNYxzP2zevHlf39mXn7HSkf7vVNVckl8B9iT5Zv/Kqqoky3pXqaodwA6Aqampmp6eHrhxs7OzrGT7cTEO/XBj/0j/uul5yxdzy/nP86H9J+LD7XAtdPz95cf7ZDQOz4fV0Go/rOicflXNdbdPAZ8HLgaeTHIWQHf7VFd9DtjQt/nZXZkkaUgGDv0kpyX5paPLwOXAQ8Au4Iau2g3AF7rlXcAbulk8lwKHPZ8vScO1ks+6k8Dne6ftORX4x6r65yT3A3cmeTPwHeD1Xf3dwJXAAeCHwBtX8NiSpAEMHPpV9TjwqnnK/xu4bJ7yAm4a9PEkSSvnL3IlqSGGviQ1xNCXpIYY+lJj9s8dZuO2e35mLr/aYehLUkMMfUlqiKEvSQ05+S9EouZ4LloanKEvnWR809NKeHpHkhriSF9rliPa5bG/tBSGviTA/07WCk/vSFJDDH1Jaoind6Qxd+y5/lvOH1FDtCY40pekhjjSlxrmjJ/2ONKXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDXHKptYUpxCuPQtdk8dr9ZycHOlLUkMc6WvkHN1Lw+NIX5Ia4khf0s9Z6NPXUso9v7+2OdKXpIY40pe0qpbyHY2fBkZn6KGfZAvwN8ApwMeravuw26DR8BSANHpDDf0kpwAfA14DPAHcn2RXVT0yzHZo9Jyx07alDABWq45+1rBH+hcDB6rqcYAkM8DVwAkJ/f1zh7mxe1L4hFie5b6YfPFpUEsZAKykzkI/KPvEltNWbV+r9ZwfxusoVXVCdjzvgyV/DGypqrd0968HLqmqm/vqbAW2dndfAXxrBQ95JvC9FWw/LuyHHvuhx37oGed++LWq+uX5Vqy5L3KragewYzX2leSBqppajX2dzOyHHvuhx37oabUfhj1lcw7Y0Hf/7K5MkjQEww79+4FNSc5J8mLgGmDXkNsgSc0a6umdqno+yc3AvfSmbO6sqodP4EOuymmiMWA/9NgPPfZDT5P9MNQvciVJo+VlGCSpIYa+JDVkLEM/yZYk30pyIMm2UbdnFJJsSPLFJI8keTjJ20bdplFKckqSrye5e9RtGaUk65J8Nsk3kzya5LdG3aZRSPLn3evioSSfSfKSUbdpWMYu9Psu9XAFcC5wbZJzR9uqkXgeuKWqzgUuBW5qtB+Oehvw6KgbsQb8DfDPVfUbwKtosE+SrAf+DJiqqvPoTSq5ZrStGp6xC336LvVQVT8Gjl7qoSlVdaiqvtYt/4Dei3v9aFs1GknOBq4CPj7qtoxSktOB3wVuB6iqH1fV90faqNE5FXhpklOBlwH/MeL2DM04hv564N/77j9Bo2F3VJKNwG8CXxlxU0blr4G/BP5vxO0YtXOA/wL+oTvV9fEkC1+AZkxV1RzwQeC7wCHgcFX9y2hbNTzjGPrqk2QC+Bzw9qp6dtTtGbYkfwA8VVX7Rt2WNeBU4ELgtqr6TeA5oLnvvJKcQe/T/znArwKnJfmT0bZqeMYx9L3UQyfJi+gF/qer6q5Rt2dEXg38YZKD9E71/V6ST422SSPzBPBEVR39xPdZem8Crfl94NtV9V9V9b/AXcBvj7hNQzOOoe+lHoAkoXfu9tGq+vCo2zMqVfXOqjq7qjbSey78W1U1M6rrV1X/Cfx7kld0RZdxgi5rvsZ9F7g0ycu618llNPSF9pq7yuZKjeBSD2vVq4Hrgf1JHuzK3lVVu0fXJK0Bfwp8uhsQPQ68ccTtGbqq+kqSzwJfozfL7es0dEkGL8MgSQ0Zx9M7kqQFGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIf8PiIyqMOpw0igAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "train['price'] = np.log1p(train['price'])\n", "train['price'].hist(bins=100)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661723, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "-aYOjQUGDVwV" }, "outputs": [], "source": [ "# 결측치 컬럼 삭제 (last_review)\n", "train = train.drop('last_review', axis=1)\n", "test = test.drop('last_review', axis=1)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661724, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "2eZXEkTNDpjJ" }, "outputs": [], "source": [ "# 결측치 채우기\n", "train['reviews_per_month'] = train['reviews_per_month'].fillna(0)\n", "test['reviews_per_month'] = test['reviews_per_month'].fillna(0)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 39, "status": "ok", "timestamp": 1654407661726, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "ZmhF7Wu2EHjD", "outputId": "a24dcb9b-0b18-41d0-98cc-39b80402ee7f" }, "outputs": [ { "data": { "text/plain": [ "id 0\n", "name 12\n", "host_id 0\n", "host_name 17\n", "neighbourhood_group 0\n", "neighbourhood 0\n", "latitude 0\n", "longitude 0\n", "room_type 0\n", "price 0\n", "minimum_nights 0\n", "number_of_reviews 0\n", "reviews_per_month 0\n", "calculated_host_listings_count 0\n", "availability_365 0\n", "dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 결측치 확인\n", "train.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "executionInfo": { "elapsed": 36, "status": "ok", "timestamp": 1654407661727, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "lUiKMyjxEmPW" }, "outputs": [], "source": [ "# 가격 값 복사\n", "target = train['price']\n", "train = train.drop('price', axis=1)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661728, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "xNP-SnrcB_HK" }, "outputs": [], "source": [ "# 수치형 피처 선택\n", "# 수치형 데이터와 범주형 데이터 분리 \n", "n_train = train.select_dtypes(exclude='object').copy()\n", "c_train = train.select_dtypes(include='object').copy()\n", "n_test = test.select_dtypes(exclude='object').copy()\n", "c_test = test.select_dtypes(include='object').copy()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 364 }, "executionInfo": { "elapsed": 38, "status": "ok", "timestamp": 1654407661729, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "7kweu2v2VMvD", "outputId": "0982eaf3-970c-4fec-f63c-8de57a59fb4c" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
count |
3.911600e+04 |
3.911600e+04 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
mean |
1.898464e+07 |
6.774143e+07 |
40.728848 |
-73.952125 |
6.990720 |
23.272855 |
1.091963 |
7.090756 |
112.980826 |
std |
1.099302e+07 |
7.881383e+07 |
0.054499 |
0.046354 |
20.310323 |
44.589170 |
1.600772 |
32.661136 |
131.674306 |
min |
2.539000e+03 |
2.438000e+03 |
40.499790 |
-74.244420 |
1.000000 |
0.000000 |
0.000000 |
1.000000 |
0.000000 |
25% |
9.412608e+06 |
7.834978e+06 |
40.690038 |
-73.983190 |
1.000000 |
1.000000 |
0.040000 |
1.000000 |
0.000000 |
50% |
1.963650e+07 |
3.070949e+07 |
40.723000 |
-73.955740 |
2.000000 |
5.000000 |
0.370000 |
1.000000 |
45.000000 |
75% |
2.913445e+07 |
1.074344e+08 |
40.762943 |
-73.936338 |
5.000000 |
23.000000 |
1.590000 |
2.000000 |
228.000000 |
max |
3.648561e+07 |
2.743213e+08 |
40.912340 |
-73.712990 |
1250.000000 |
629.000000 |
58.500000 |
327.000000 |
365.000000 |
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights \\\n", "count 3.911600e+04 3.911600e+04 39116.000000 39116.000000 39116.000000 \n", "mean 1.898464e+07 6.774143e+07 40.728848 -73.952125 6.990720 \n", "std 1.099302e+07 7.881383e+07 0.054499 0.046354 20.310323 \n", "min 2.539000e+03 2.438000e+03 40.499790 -74.244420 1.000000 \n", "25% 9.412608e+06 7.834978e+06 40.690038 -73.983190 1.000000 \n", "50% 1.963650e+07 3.070949e+07 40.723000 -73.955740 2.000000 \n", "75% 2.913445e+07 1.074344e+08 40.762943 -73.936338 5.000000 \n", "max 3.648561e+07 2.743213e+08 40.912340 -73.712990 1250.000000 \n", "\n", " number_of_reviews reviews_per_month calculated_host_listings_count \\\n", "count 39116.000000 39116.000000 39116.000000 \n", "mean 23.272855 1.091963 7.090756 \n", "std 44.589170 1.600772 32.661136 \n", "min 0.000000 0.000000 1.000000 \n", "25% 1.000000 0.040000 1.000000 \n", "50% 5.000000 0.370000 1.000000 \n", "75% 23.000000 1.590000 2.000000 \n", "max 629.000000 58.500000 327.000000 \n", "\n", " availability_365 \n", "count 39116.000000 \n", "mean 112.980826 \n", "std 131.674306 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 45.000000 \n", "75% 228.000000 \n", "max 365.000000 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_train.describe()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 208 }, "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661730, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "znZJ-DLKVEa8", "outputId": "cc18e292-492f-4f3f-8dc6-b7ec25798b72" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
14963583 |
94219511 |
40.80167 |
-73.95781 |
3 |
3 |
0.09 |
2 |
0 |
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 14963583 94219511 40.80167 -73.95781 3 3 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.09 2 0 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
14963583 |
0.343458 |
0.731742 |
0.539318 |
0.001601 |
0.004769 |
0.001538 |
0.003067 |
0.0 |
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 14963583 0.343458 0.731742 0.539318 0.001601 0.004769 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.001538 0.003067 0.0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 수치형 변수\n", "from sklearn.preprocessing import MinMaxScaler\n", "scaler = MinMaxScaler()\n", "cols = [\n", " 'host_id',\n", " 'latitude',\n", " 'longitude',\n", " 'minimum_nights',\n", " 'number_of_reviews', \n", " 'reviews_per_month',\n", " 'calculated_host_listings_count',\n", " 'availability_365'\n", " ]\n", "\n", "display(n_train.head(1))\n", "n_train[cols] = scaler.fit_transform(n_train[cols])\n", "n_test[cols] = scaler.transform(n_test[cols])\n", "display(n_train.head(1))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661731, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "teSF1NETVEdj" }, "outputs": [], "source": [ "n_train = n_train.drop('id', axis=1)\n", "n_test = n_test.drop('id', axis=1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 125 }, "executionInfo": { "elapsed": 36, "status": "ok", "timestamp": 1654407661731, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "7ZVqobuHYENc", "outputId": "87a2a2b4-059a-457a-fae7-4103841850f3" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
0 |
Room in South Harlem near Central Park |
Gilles |
Manhattan |
Harlem |
Private room |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles Manhattan \n", "\n", " neighbourhood room_type \n", "0 Harlem Private room " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_train.head(1)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 514, "status": "ok", "timestamp": 1654407662210, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "3maaNowPUTkr", "outputId": "617cb500-9e80-40a8-ffb7-b40d41f76ce5" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
count |
39104 |
39099 |
39116 |
39116 |
39116 |
unique |
38420 |
9977 |
5 |
221 |
3 |
top |
Home away from home |
Michael |
Manhattan |
Williamsburg |
Entire home/apt |
freq |
15 |
338 |
17331 |
3099 |
20299 |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group neighbourhood \\\n", "count 39104 39099 39116 39116 \n", "unique 38420 9977 5 221 \n", "top Home away from home Michael Manhattan Williamsburg \n", "freq 15 338 17331 3099 \n", "\n", " room_type \n", "count 39116 \n", "unique 3 \n", "top Entire home/apt \n", "freq 20299 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_train.describe()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 188 }, "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1654407662210, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "m3HHLxc-VEfm", "outputId": "a5dae84a-9bc9-4ccc-bed1-89f967e52b75" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
0 |
Room in South Harlem near Central Park |
Gilles |
Manhattan |
Harlem |
Private room |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles Manhattan \n", "\n", " neighbourhood room_type \n", "0 Harlem Private room " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
0 |
Room in South Harlem near Central Park |
Gilles |
2 |
94 |
1 |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles 2 \n", "\n", " neighbourhood room_type \n", "0 94 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 범주형 변수\n", "from sklearn.preprocessing import LabelEncoder\n", "le = LabelEncoder()\n", "cols = [\n", " 'neighbourhood_group',\n", " 'neighbourhood',\n", " 'room_type'\n", " ]\n", "\n", "display(c_train.head(1))\n", "for col in cols:\n", " c_train[col] = le.fit_transform(c_train[col])\n", " c_test[col] = le.transform(c_test[col])\n", "\n", "display(c_train.head(1))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1654407662211, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "rO_pPlqXW7mT" }, "outputs": [], "source": [ "del_cols =['name','host_name']\n", "c_train = c_train.drop(del_cols, axis=1)\n", "c_test = c_test.drop(del_cols, axis=1)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 288 }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654407678323, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "pjU9by2YVEh9", "outputId": "bfa46c6a-951d-4f6e-b36b-5d1716d3ef87" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(39116, 11) (9779, 11)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
neighbourhood_group |
neighbourhood |
room_type |
0 |
0.343458 |
0.731742 |
0.539318 |
0.001601 |
0.004769 |
0.001538 |
0.003067 |
0.000000 |
2 |
94 |
1 |
1 |
0.178671 |
0.631633 |
0.478445 |
0.000801 |
0.101749 |
0.028718 |
0.003067 |
0.717808 |
2 |
95 |
0 |
2 |
0.001595 |
0.558041 |
0.449354 |
0.047238 |
0.001590 |
0.003419 |
0.000000 |
0.000000 |
2 |
209 |
0 |
3 |
0.013033 |
0.464162 |
0.579361 |
0.002402 |
0.379968 |
0.049402 |
0.003067 |
0.002740 |
1 |
13 |
0 |
4 |
0.045468 |
0.458611 |
0.543571 |
0.021617 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
1 |
13 |
1 |
\n", "
" ], "text/plain": [ " host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 0.343458 0.731742 0.539318 0.001601 0.004769 \n", "1 0.178671 0.631633 0.478445 0.000801 0.101749 \n", "2 0.001595 0.558041 0.449354 0.047238 0.001590 \n", "3 0.013033 0.464162 0.579361 0.002402 0.379968 \n", "4 0.045468 0.458611 0.543571 0.021617 0.000000 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \\\n", "0 0.001538 0.003067 0.000000 \n", "1 0.028718 0.003067 0.717808 \n", "2 0.003419 0.000000 0.000000 \n", "3 0.049402 0.003067 0.002740 \n", "4 0.000000 0.000000 0.000000 \n", "\n", " neighbourhood_group neighbourhood room_type \n", "0 2 94 1 \n", "1 2 95 0 \n", "2 2 209 0 \n", "3 1 13 0 \n", "4 1 13 1 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 분리한 데이터 다시 합침\n", "train = pd.concat([n_train, c_train], axis=1)\n", "test = pd.concat([n_test, c_test], axis=1)\n", "print(train.shape, test.shape)\n", "train.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "CJ_v5LsbBJDz" }, "source": [ "## 검증 데이터 분리" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 257, "status": "ok", "timestamp": 1654407696859, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "IHaYuF40BLJY", "outputId": "5888e9f3-2878-4c63-bf29-6bea194a5bf0" }, "outputs": [ { "data": { "text/plain": [ "((31292, 11), (7824, 11), (31292,), (7824,))" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 검증 데이터 분리\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_val, y_train, y_val = train_test_split(train, target, test_size=0.2, random_state=2022)\n", "X_train.shape, X_val.shape, y_train.shape, y_val.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "0vuCvtU_B4Nb" }, "source": [ "## 머신러닝" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 45082, "status": "ok", "timestamp": 1654408150815, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "rJ8ECneaG3dc", "outputId": "a12c41e2-4a17-4610-c4fc-30ce62d556b8" }, "outputs": [ { "data": { "text/plain": [ "25325.322081149014" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 랜덤포레스트\n", "from sklearn.ensemble import RandomForestRegressor\n", "\n", "model = RandomForestRegressor(random_state=2022, n_estimators=200)\n", "model.fit(X_train, y_train)\n", "pred = model.predict(X_val)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_val, pred)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 34774, "status": "ok", "timestamp": 1654408323848, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "yqEzdSnSG3gV", "outputId": "c8ca8722-e919-4571-8388-620bef5a8f0e" }, "outputs": [ { "data": { "text/plain": [ "25027.49844781765" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Xgboost\n", "from xgboost import XGBRegressor\n", "model = XGBRegressor(max_depth=10,\n", " learning_rate=0.02,\n", " n_estimators=500,\n", " random_state=2022)\n", "\n", "model.fit(X_train, y_train)\n", "pred = model.predict(X_val)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_val, pred)" ] }, { "cell_type": "markdown", "metadata": { "id": "OVVOOGAkIb8R" }, "source": [ "## 채점" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 1105, "status": "ok", "timestamp": 1654408331733, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "62aQrZeyG3nb", "outputId": "f73e32d4-4d2e-409d-fa87-a14f3f02a4ad" }, "outputs": [ { "data": { "text/plain": [ "42778.2854814971" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# test 데이터 예측 및 평가\n", "y_test = pd.read_csv('y_test.csv')\n", "\n", "# Xgboost\n", "pred = model.predict(test)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_test, pred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654407722125, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "cgJYGvYAPwnL" }, "outputs": [], "source": [] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyMwWhKBT8hxADeG/1F1qBkW", "name": "머신러닝_기초_노드10_프로젝트(성능향상).ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 1 }