7. 성능향상 Tip
baseline score보다 점수가 낮게 나오도록 프로젝트를 도전해 보셨나요?
성능 향상을 위한 TIP코드를 준비했으니 아래 준비한 파일을 다운 받아 LMS에 직접 입력하거나 코랩, 주피터 노트북 등에서 활용해 보세요.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "ySTMVUAR458d"
},
"source": [
"# 머신러닝 프로젝트"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KWE9ZloU48sf"
},
"source": [
"## Airbnb (New York City)\n",
"- 미국 NYC Airbnb 목록(2019)\n",
"- 데이터 출처:https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data (License CC0: Public Domain)\n",
"- 프로젝트 목적: 가격 예측(price)\n",
"- 제공 데이터(3개): train.csv, test.csv, y_test(최종 채점용)\n",
"- 평가 방식: MSE\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "swTGNLoBFaS6"
},
"source": [
"# 성능향상 Tip"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"executionInfo": {
"elapsed": 274,
"status": "ok",
"timestamp": 1654407659997,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "UHaAsvYa9jAX"
},
"outputs": [],
"source": [
"# 라이브러리 \n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"executionInfo": {
"elapsed": 418,
"status": "ok",
"timestamp": 1654407660942,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "b8ar8Ohk_h4Z"
},
"outputs": [],
"source": [
"# 데이터 불러오기\n",
"train = pd.read_csv('train.csv')\n",
"test = pd.read_csv('test.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1BPuoeckATA3"
},
"source": [
"## EDA"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 21,
"status": "ok",
"timestamp": 1654407660942,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "3URb2ddyAHMc",
"outputId": "10dc3207-23ca-4479-fced-6044e844d28e"
},
"outputs": [
{
"data": {
"text/plain": [
"((39116, 16), (9779, 15))"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 데이터 크기\n",
"train.shape, test.shape"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 531
},
"executionInfo": {
"elapsed": 20,
"status": "ok",
"timestamp": 1654407660943,
"user": {
"displayName": "Tae Heon Kim",
"userId": "07653788752262629837"
},
"user_tz": -540
},
"id": "BwkRFT7oART_",
"outputId": "4781f6ff-d9b7-476c-aad4-a771ccaccae9"
},
"outputs": [
{
"data": {
"text/html": [
"
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
name |
host_id |
host_name |
neighbourhood_group |
neighbourhood |
latitude |
longitude |
room_type |
price |
minimum_nights |
number_of_reviews |
last_review |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
14963583 |
Room in South Harlem near Central Park |
94219511 |
Gilles |
Manhattan |
Harlem |
40.80167 |
-73.95781 |
Private room |
70 |
3 |
3 |
2019-01-01 |
0.09 |
2 |
0 |
1 |
9458704 |
Large 1BR Apartment, near Times Sq (2nd Floor) |
49015331 |
Iradj |
Manhattan |
Hell's Kitchen |
40.76037 |
-73.99016 |
Entire home/apt |
240 |
2 |
64 |
2019-06-30 |
1.68 |
2 |
262 |
\n", "
" ], "text/plain": [ " id name host_id \\\n", "0 14963583 Room in South Harlem near Central Park 94219511 \n", "1 9458704 Large 1BR Apartment, near Times Sq (2nd Floor) 49015331 \n", "\n", " host_name neighbourhood_group neighbourhood latitude longitude \\\n", "0 Gilles Manhattan Harlem 40.80167 -73.95781 \n", "1 Iradj Manhattan Hell's Kitchen 40.76037 -73.99016 \n", "\n", " room_type price minimum_nights number_of_reviews last_review \\\n", "0 Private room 70 3 3 2019-01-01 \n", "1 Entire home/apt 240 2 64 2019-06-30 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.09 2 0 \n", "1 1.68 2 262 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
name |
host_id |
host_name |
neighbourhood_group |
neighbourhood |
latitude |
longitude |
room_type |
minimum_nights |
number_of_reviews |
last_review |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
30913224 |
Cozy and Sunny Room Williamsburg, Luxury Building |
33771081 |
Rémy |
Brooklyn |
Williamsburg |
40.70959 |
-73.94652 |
Private room |
3 |
2 |
2019-05-08 |
0.31 |
1 |
0 |
1 |
971247 |
Sunny Artist Live/Work Apartment |
5308961 |
Larry |
Manhattan |
Upper West Side |
40.79368 |
-73.96487 |
Entire home/apt |
3 |
159 |
2019-07-03 |
2.09 |
1 |
244 |
\n", "
" ], "text/plain": [ " id name host_id \\\n", "0 30913224 Cozy and Sunny Room Williamsburg, Luxury Building 33771081 \n", "1 971247 Sunny Artist Live/Work Apartment 5308961 \n", "\n", " host_name neighbourhood_group neighbourhood latitude longitude \\\n", "0 Rémy Brooklyn Williamsburg 40.70959 -73.94652 \n", "1 Larry Manhattan Upper West Side 40.79368 -73.96487 \n", "\n", " room_type minimum_nights number_of_reviews last_review \\\n", "0 Private room 3 2 2019-05-08 \n", "1 Entire home/apt 3 159 2019-07-03 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.31 1 0 \n", "1 2.09 1 244 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 데이터 샘플\n", "display(train.head(2))\n", "display(test.head(2))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 283 }, "executionInfo": { "elapsed": 432, "status": "ok", "timestamp": 1654407661366, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "g4bdXDwhApJX", "outputId": "d16ead8b-d6ca-487f-bb9c-592bc144242a" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAD4CAYAAADsKpHdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWVUlEQVR4nO3df7DddZ3f8eerycJSViWIvRMTtonT6AxKi3IHcNx1bpcVAu6IdhwbhlmiUqNVZtaWmW2oncFqmcGtrC3U4sY1FXeyICtqMohlI/XWdqYgYaUkINlcMJZkAlmBlV7docZ994/zuXi8uTe5nHN/5N7zfMx853y/7++P832fb8gr3x/nkKpCkqS/s9A7IEk6MRgIkiTAQJAkNQaCJAkwECRJzfKF3oFenXHGGbVmzZqe1v3JT37CqaeeOrs7dIKz58Fgz0tfv/0++OCDP6qqV001b9EGwpo1a9i1a1dP646OjjIyMjK7O3SCs+fBYM9LX7/9JvnhdPO8ZCRJAgwESVJjIEiSAANBktQYCJIkYAaBkGRrksNJ9nTVvpzkoTbsT/JQq69J8jdd8z7Xtc65SXYnGUtyU5K0+ulJdibZ115XzEGfkqTjmMkZwheB9d2FqvqnVXVOVZ0D3Al8tWv24xPzqupDXfVbgA8A69owsc3NwL1VtQ64t01LkubZcQOhqr4DPDvVvPav/PcAtx1rG0lWAi+vqvuq83vbXwLe2WZfBtzaxm/tqkuS5lG/9xB+E3i6qvZ11dYm+V6S/57kN1ttFXCga5kDrQYwVFWH2vhTwFCf+yRJ6kG/31S+nF8+OzgE/HpVPZPkXODrSV4/041VVSWZ9v/Yk2QTsAlgaGiI0dHRnnb68LM/5uZt2wE4e9UretrGYjM+Pt7z57VY2fNgGLSe57LfngMhyXLgnwDnTtSq6gXghTb+YJLHgdcCB4HVXauvbjWAp5OsrKpD7dLS4enes6q2AFsAhoeHq9evb9+8bTs37u60vv+K3rax2Aza1/vBngfFoPU8l/32c8not4HHqurFS0FJXpVkWRt/DZ2bx0+0S0LPJ7mg3Xe4EtjeVtsBbGzjG7vqkqR5NJPHTm8D/hfwuiQHklzVZm3g6JvJbwUebo+hfgX4UFVN3JD+MPDHwBjwOPDNVr8BeFuSfXRC5obe25Ek9eq4l4yq6vJp6u+donYnncdQp1p+F/CGKerPABcebz8kSXPLbypLkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAmYQSAk2ZrkcJI9XbWPJzmY5KE2XNo179okY0n2Jrm4q76+1caSbO6qr01yf6t/OclJs9mgJGlmZnKG8EVg/RT1z1TVOW24GyDJWcAG4PVtnf+cZFmSZcBngUuAs4DL27IAn2rb+gfAc8BV/TQkSerNcQOhqr4DPDvD7V0G3F5VL1TVD4Ax4Lw2jFXVE1X1/4DbgcuSBPgt4Ctt/VuBd760FiRJs2F5H+teneRKYBdwTVU9B6wC7uta5kCrATw5qX4+8Ergr6vqyBTLHyXJJmATwNDQEKOjoz3t+NApcM3ZnbfsdRuLzfj4+MD0OsGeB8Og9TyX/fYaCLcAnwSqvd4IvH+2dmo6VbUF2AIwPDxcIyMjPW3n5m3buXF3p/X9V/S2jcVmdHSUXj+vxcqeB8Og9TyX/fYUCFX19MR4ks8Dd7XJg8CZXYuubjWmqT8DnJZkeTtL6F5ekjSPenrsNMnKrsl3ARNPIO0ANiQ5OclaYB3wXeABYF17ougkOjeed1RVAd8G3t3W3whs72WfJEn9Oe4ZQpLbgBHgjCQHgOuAkSTn0LlktB/4IEBVPZLkDuBR4Ajwkar6edvO1cA9wDJga1U90t7iXwG3J/l3wPeAL8xWc5KkmTtuIFTV5VOUp/1Lu6quB66fon43cPcU9SfoPIUkSVpAflNZkgQYCJKkxkCQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBPT389dLwprN33hxfP8Nb1/APZGkheUZgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQJmEAhJtiY5nGRPV+3fJ3ksycNJvpbktFZfk+RvkjzUhs91rXNukt1JxpLclCStfnqSnUn2tdcVc9CnJOk4ZnKG8EVg/aTaTuANVfUPgb8Eru2a93hVndOGD3XVbwE+AKxrw8Q2NwP3VtU64N42LUmaZ8cNhKr6DvDspNqfV9WRNnkfsPpY20iyEnh5Vd1XVQV8CXhnm30ZcGsbv7WrLkmaR7Px89fvB77cNb02yfeA54F/U1X/A1gFHOha5kCrAQxV1aE2/hQwNN0bJdkEbAIYGhpidHS0px0eOgWuOfvIUfVet7cYjI+PL+n+pmLPg2HQep7LfvsKhCQfA44A21rpEPDrVfVMknOBryd5/Uy3V1WVpI4xfwuwBWB4eLhGRkZ62u+bt23nxt1Ht77/it62txiMjo7S6+e1WNnzYBi0nuey354DIcl7gd8BLmyXgaiqF4AX2viDSR4HXgsc5JcvK61uNYCnk6ysqkPt0tLhXvdJktS7nh47TbIe+H3gHVX10676q5Isa+OvoXPz+Il2Sej5JBe0p4uuBLa31XYAG9v4xq66JGkeHfcMIcltwAhwRpIDwHV0nio6GdjZnh69rz1R9FbgE0l+Bvwt8KGqmrgh/WE6TyydAnyzDQA3AHckuQr4IfCeWelMkvSSHDcQquryKcpfmGbZO4E7p5m3C3jDFPVngAuPtx+SpLnlN5UlSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJagwESRJgIEiSmhkFQpKtSQ4n2dNVOz3JziT72uuKVk+Sm5KMJXk4yZu61tnYlt+XZGNX/dwku9s6NyXJbDYpSTq+mZ4hfBFYP6m2Gbi3qtYB97ZpgEuAdW3YBNwCnQABrgPOB84DrpsIkbbMB7rWm/xekqQ5NqNAqKrvAM9OKl8G3NrGbwXe2VX/UnXcB5yWZCVwMbCzqp6tqueAncD6Nu/lVXVfVRXwpa5tSZLmyfI+1h2qqkNt/ClgqI2vAp7sWu5Aqx2rfmCK+lGSbKJz1sHQ0BCjo6O97fgpcM3ZR46q97q9xWB8fHxJ9zcVex4Mg9bzXPbbTyC8qKoqSc3Gto7zPluALQDDw8M1MjLS03Zu3radG3cf3fr+K3rb3mIwOjpKr5/XYmXPg2HQep7Lfvt5yujpdrmH9nq41Q8CZ3Ytt7rVjlVfPUVdkjSP+gmEHcDEk0Ibge1d9Svb00YXAD9ul5buAS5KsqLdTL4IuKfNez7JBe3poiu7tiVJmiczumSU5DZgBDgjyQE6TwvdANyR5Crgh8B72uJ3A5cCY8BPgfcBVNWzST4JPNCW+0RVTdyo/jCdJ5lOAb7ZBknSPJpRIFTV5dPMunCKZQv4yDTb2QpsnaK+C3jDTPZFkjQ3/KayJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJKAPgIhyeuSPNQ1PJ/ko0k+nuRgV/3SrnWuTTKWZG+Si7vq61ttLMnmfpuSJL10y3tdsar2AucAJFkGHAS+BrwP+ExVfbp7+SRnARuA1wOvBr6V5LVt9meBtwEHgAeS7KiqR3vdN0nSS9dzIExyIfB4Vf0wyXTLXAbcXlUvAD9IMgac1+aNVdUTAElub8saCJI0j2YrEDYAt3VNX53kSmAXcE1VPQesAu7rWuZAqwE8Oal+/lRvkmQTsAlgaGiI0dHRnnZ26BS45uwjR9V73d5iMD4+vqT7m4o9D4ZB63ku++07EJKcBLwDuLaVbgE+CVR7vRF4f7/vA1BVW4AtAMPDwzUyMtLTdm7etp0bdx/d+v4retveYjA6Okqvn9diZc+DYdB6nst+Z+MM4RLgL6rqaYCJV4AknwfuapMHgTO71lvdahyjLkmaJ7Px2OnldF0uSrKya967gD1tfAewIcnJSdYC64DvAg8A65KsbWcbG9qykqR51NcZQpJT6Twd9MGu8h8kOYfOJaP9E/Oq6pEkd9C5WXwE+EhV/bxt52rgHmAZsLWqHulnvyRJL11fgVBVPwFeOan2u8dY/nrg+inqdwN397MvkqT++E1lSRJgIEiSGgNBkgQYCJKkxkCQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpKbvQEiyP8nuJA8l2dVqpyfZmWRfe13R6klyU5KxJA8neVPXdja25fcl2djvfkmSXprZOkP4x1V1TlUNt+nNwL1VtQ64t00DXAKsa8Mm4BboBAhwHXA+cB5w3USISJLmx1xdMroMuLWN3wq8s6v+peq4DzgtyUrgYmBnVT1bVc8BO4H1c7RvkqQpzEYgFPDnSR5MsqnVhqrqUBt/Chhq46uAJ7vWPdBq09UlSfNk+Sxs4zeq6mCSvwfsTPJY98yqqiQ1C+9DC5xNAENDQ4yOjva0naFT4JqzjxxV73V7i8H4+PiS7m8q9jwYBq3nuey370CoqoPt9XCSr9G5B/B0kpVVdahdEjrcFj8InNm1+upWOwiMTKqPTvFeW4AtAMPDwzUyMjJ5kRm5edt2btx9dOv7r+hte4vB6OgovX5ei5U9D4ZB63ku++3rklGSU5O8bGIcuAjYA+wAJp4U2ghsb+M7gCvb00YXAD9ul5buAS5KsqLdTL6o1SRJ86TfM4Qh4GtJJrb1p1X1X5M8ANyR5Crgh8B72vJ3A5cCY8BPgfcBVNWzST4JPNCW+0RVPdvnvr1kazZ/48Xx/Te8fb7fXpIWVF+BUFVPAP9oivozwIVT1Av4yDTb2gps7Wd/JEm985vKkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkC+giEJGcm+XaSR5M8kuT3Wv3jSQ4meagNl3atc22SsSR7k1zcVV/famNJNvfXkiSpF8v7WPcIcE1V/UWSlwEPJtnZ5n2mqj7dvXCSs4ANwOuBVwPfSvLaNvuzwNuAA8ADSXZU1aN97Jsk6SXqORCq6hBwqI3/3yTfB1YdY5XLgNur6gXgB0nGgPPavLGqegIgye1tWQNBkuZRP2cIL0qyBngjcD/wFuDqJFcCu+icRTxHJyzu61rtAL8IkCcn1c+f5n02AZsAhoaGGB0d7Wl/h06Ba84+csxlet32iWp8fHzJ9XQ89jwYBq3nuey370BI8mvAncBHq+r5JLcAnwSqvd4IvL/f9wGoqi3AFoDh4eEaGRnpaTs3b9vOjbuP3fr+K3rb9olqdHSUXj+vxcqeB8Og9TyX/fYVCEl+hU4YbKuqrwJU1dNd8z8P3NUmDwJndq2+utU4Rl2SNE/6ecoowBeA71fVH3bVV3Yt9i5gTxvfAWxIcnKStcA64LvAA8C6JGuTnETnxvOOXvdLktSbfs4Q3gL8LrA7yUOt9q+By5OcQ+eS0X7ggwBV9UiSO+jcLD4CfKSqfg6Q5GrgHmAZsLWqHuljvyRJPejnKaP/CWSKWXcfY53rgeunqN99rPUkSXPPbypLkgADQZLUGAiSJMBAkCQ1BoIkCZiln65YitZs/saL4/tvePsC7okkzQ/PECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqTEQJEmAgSBJavzpihnwZywkDQLPECRJgIEgSWoMBEkS4D2El8z7CZKWqhPmDCHJ+iR7k4wl2bzQ+yNJg+aEOENIsgz4LPA24ADwQJIdVfXowu7ZsXWfLXTzzEHSYnRCBAJwHjBWVU8AJLkduAw4oQNhOtMFxWQGh6QTyYkSCKuAJ7umDwDnT14oySZgU5scT7K3x/c7A/hRj+vOmnxqXt/uhOh5ntnzYBi0nvvt9+9PN+NECYQZqaotwJZ+t5NkV1UNz8IuLRr2PBjseemby35PlJvKB4Ezu6ZXt5okaZ6cKIHwALAuydokJwEbgB0LvE+SNFBOiEtGVXUkydXAPcAyYGtVPTKHb9n3ZadFyJ4Hgz0vfXPWb6pqrrYtSVpETpRLRpKkBWYgSJKAAQyEpfITGUnOTPLtJI8meSTJ77X66Ul2JtnXXle0epLc1Pp+OMmbura1sS2/L8nGheppppIsS/K9JHe16bVJ7m+9fbk9mECSk9v0WJu/pmsb17b63iQXL1ArM5LktCRfSfJYku8nefNSP85J/kX7c70nyW1JfnWpHeckW5McTrKnqzZrxzXJuUl2t3VuSpLj7lRVDcxA54b148BrgJOA/w2ctdD71WMvK4E3tfGXAX8JnAX8AbC51TcDn2rjlwLfBAJcANzf6qcDT7TXFW18xUL3d5ze/yXwp8BdbfoOYEMb/xzwz9v4h4HPtfENwJfb+Fnt2J8MrG1/JpYtdF/H6PdW4J+18ZOA05bycabzRdUfAKd0Hd/3LrXjDLwVeBOwp6s2a8cV+G5bNm3dS467Twv9oczzAXgzcE/X9LXAtQu9X7PU23Y6vwW1F1jZaiuBvW38j4DLu5bf2+ZfDvxRV/2XljvRBjrfUbkX+C3grvaH/UfA8snHmM5Ta29u48vbcpl83LuXO9EG4BXtL8dMqi/Z48wvfrng9Hbc7gIuXorHGVgzKRBm5bi2eY911X9puemGQbtkNNVPZKxaoH2ZNe0U+Y3A/cBQVR1qs54Chtr4dL0vts/kPwC/D/xtm34l8NdVdaRNd+//i721+T9uyy+mntcCfwX8l3aZ7I+TnMoSPs5VdRD4NPB/gEN0jtuDLO3jPGG2juuqNj65fkyDFghLTpJfA+4EPlpVz3fPq84/DZbMc8VJfgc4XFUPLvS+zKPldC4r3FJVbwR+QudSwouW4HFeQefHLdcCrwZOBdYv6E4tgIU4roMWCEvqJzKS/AqdMNhWVV9t5aeTrGzzVwKHW3263hfTZ/IW4B1J9gO307ls9B+B05JMfMmye/9f7K3NfwXwDIur5wPAgaq6v01/hU5ALOXj/NvAD6rqr6rqZ8BX6Rz7pXycJ8zWcT3YxifXj2nQAmHJ/ERGe2LgC8D3q+oPu2btACaeNNhI597CRP3K9rTCBcCP26npPcBFSVa0f5ld1GonnKq6tqpWV9UaOsfuv1XVFcC3gXe3xSb3PPFZvLstX62+oT2dshZYR+cG3Amnqp4Cnkzyula6kM7Pwi/Z40znUtEFSf5u+3M+0fOSPc5dZuW4tnnPJ7mgfYZXdm1regt9U2UBbuJcSueJnMeBjy30/vTRx2/QOZ18GHioDZfSuXZ6L7AP+BZwels+dP4nRI8Du4Hhrm29Hxhrw/sWurcZ9j/CL54yeg2d/9DHgD8DTm71X23TY23+a7rW/1j7LPYyg6cvFrjXc4Bd7Vh/nc7TJEv6OAP/FngM2AP8CZ0nhZbUcQZuo3OP5Gd0zgSvms3jCgy3z+9x4D8x6cGEqQZ/ukKSBAzeJSNJ0jQMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqfn/rab4s2/vk8AAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "train['price'] = np.log1p(train['price'])\n", "train['price'].hist(bins=100)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661723, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "-aYOjQUGDVwV" }, "outputs": [], "source": [ "# 결측치 컬럼 삭제 (last_review)\n", "train = train.drop('last_review', axis=1)\n", "test = test.drop('last_review', axis=1)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661724, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "2eZXEkTNDpjJ" }, "outputs": [], "source": [ "# 결측치 채우기\n", "train['reviews_per_month'] = train['reviews_per_month'].fillna(0)\n", "test['reviews_per_month'] = test['reviews_per_month'].fillna(0)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 39, "status": "ok", "timestamp": 1654407661726, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "ZmhF7Wu2EHjD", "outputId": "a24dcb9b-0b18-41d0-98cc-39b80402ee7f" }, "outputs": [ { "data": { "text/plain": [ "id 0\n", "name 12\n", "host_id 0\n", "host_name 17\n", "neighbourhood_group 0\n", "neighbourhood 0\n", "latitude 0\n", "longitude 0\n", "room_type 0\n", "price 0\n", "minimum_nights 0\n", "number_of_reviews 0\n", "reviews_per_month 0\n", "calculated_host_listings_count 0\n", "availability_365 0\n", "dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 결측치 확인\n", "train.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "executionInfo": { "elapsed": 36, "status": "ok", "timestamp": 1654407661727, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "lUiKMyjxEmPW" }, "outputs": [], "source": [ "# 가격 값 복사\n", "target = train['price']\n", "train = train.drop('price', axis=1)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661728, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "xNP-SnrcB_HK" }, "outputs": [], "source": [ "# 수치형 피처 선택\n", "# 수치형 데이터와 범주형 데이터 분리 \n", "n_train = train.select_dtypes(exclude='object').copy()\n", "c_train = train.select_dtypes(include='object').copy()\n", "n_test = test.select_dtypes(exclude='object').copy()\n", "c_test = test.select_dtypes(include='object').copy()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 364 }, "executionInfo": { "elapsed": 38, "status": "ok", "timestamp": 1654407661729, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "7kweu2v2VMvD", "outputId": "0982eaf3-970c-4fec-f63c-8de57a59fb4c" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
count |
3.911600e+04 |
3.911600e+04 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
39116.000000 |
mean |
1.898464e+07 |
6.774143e+07 |
40.728848 |
-73.952125 |
6.990720 |
23.272855 |
1.091963 |
7.090756 |
112.980826 |
std |
1.099302e+07 |
7.881383e+07 |
0.054499 |
0.046354 |
20.310323 |
44.589170 |
1.600772 |
32.661136 |
131.674306 |
min |
2.539000e+03 |
2.438000e+03 |
40.499790 |
-74.244420 |
1.000000 |
0.000000 |
0.000000 |
1.000000 |
0.000000 |
25% |
9.412608e+06 |
7.834978e+06 |
40.690038 |
-73.983190 |
1.000000 |
1.000000 |
0.040000 |
1.000000 |
0.000000 |
50% |
1.963650e+07 |
3.070949e+07 |
40.723000 |
-73.955740 |
2.000000 |
5.000000 |
0.370000 |
1.000000 |
45.000000 |
75% |
2.913445e+07 |
1.074344e+08 |
40.762943 |
-73.936338 |
5.000000 |
23.000000 |
1.590000 |
2.000000 |
228.000000 |
max |
3.648561e+07 |
2.743213e+08 |
40.912340 |
-73.712990 |
1250.000000 |
629.000000 |
58.500000 |
327.000000 |
365.000000 |
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights \\\n", "count 3.911600e+04 3.911600e+04 39116.000000 39116.000000 39116.000000 \n", "mean 1.898464e+07 6.774143e+07 40.728848 -73.952125 6.990720 \n", "std 1.099302e+07 7.881383e+07 0.054499 0.046354 20.310323 \n", "min 2.539000e+03 2.438000e+03 40.499790 -74.244420 1.000000 \n", "25% 9.412608e+06 7.834978e+06 40.690038 -73.983190 1.000000 \n", "50% 1.963650e+07 3.070949e+07 40.723000 -73.955740 2.000000 \n", "75% 2.913445e+07 1.074344e+08 40.762943 -73.936338 5.000000 \n", "max 3.648561e+07 2.743213e+08 40.912340 -73.712990 1250.000000 \n", "\n", " number_of_reviews reviews_per_month calculated_host_listings_count \\\n", "count 39116.000000 39116.000000 39116.000000 \n", "mean 23.272855 1.091963 7.090756 \n", "std 44.589170 1.600772 32.661136 \n", "min 0.000000 0.000000 1.000000 \n", "25% 1.000000 0.040000 1.000000 \n", "50% 5.000000 0.370000 1.000000 \n", "75% 23.000000 1.590000 2.000000 \n", "max 629.000000 58.500000 327.000000 \n", "\n", " availability_365 \n", "count 39116.000000 \n", "mean 112.980826 \n", "std 131.674306 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 45.000000 \n", "75% 228.000000 \n", "max 365.000000 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_train.describe()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 208 }, "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661730, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "znZJ-DLKVEa8", "outputId": "cc18e292-492f-4f3f-8dc6-b7ec25798b72" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
14963583 |
94219511 |
40.80167 |
-73.95781 |
3 |
3 |
0.09 |
2 |
0 |
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 14963583 94219511 40.80167 -73.95781 3 3 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.09 2 0 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
id |
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
0 |
14963583 |
0.343458 |
0.731742 |
0.539318 |
0.001601 |
0.004769 |
0.001538 |
0.003067 |
0.0 |
\n", "
" ], "text/plain": [ " id host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 14963583 0.343458 0.731742 0.539318 0.001601 0.004769 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.001538 0.003067 0.0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 수치형 변수\n", "from sklearn.preprocessing import MinMaxScaler\n", "scaler = MinMaxScaler()\n", "cols = [\n", " 'host_id',\n", " 'latitude',\n", " 'longitude',\n", " 'minimum_nights',\n", " 'number_of_reviews', \n", " 'reviews_per_month',\n", " 'calculated_host_listings_count',\n", " 'availability_365'\n", " ]\n", "\n", "display(n_train.head(1))\n", "n_train[cols] = scaler.fit_transform(n_train[cols])\n", "n_test[cols] = scaler.transform(n_test[cols])\n", "display(n_train.head(1))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "executionInfo": { "elapsed": 37, "status": "ok", "timestamp": 1654407661731, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "teSF1NETVEdj" }, "outputs": [], "source": [ "n_train = n_train.drop('id', axis=1)\n", "n_test = n_test.drop('id', axis=1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 125 }, "executionInfo": { "elapsed": 36, "status": "ok", "timestamp": 1654407661731, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "7ZVqobuHYENc", "outputId": "87a2a2b4-059a-457a-fae7-4103841850f3" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
0 |
Room in South Harlem near Central Park |
Gilles |
Manhattan |
Harlem |
Private room |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles Manhattan \n", "\n", " neighbourhood room_type \n", "0 Harlem Private room " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_train.head(1)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 514, "status": "ok", "timestamp": 1654407662210, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "3maaNowPUTkr", "outputId": "617cb500-9e80-40a8-ffb7-b40d41f76ce5" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
count |
39104 |
39099 |
39116 |
39116 |
39116 |
unique |
38420 |
9977 |
5 |
221 |
3 |
top |
Home away from home |
Michael |
Manhattan |
Williamsburg |
Entire home/apt |
freq |
15 |
338 |
17331 |
3099 |
20299 |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group neighbourhood \\\n", "count 39104 39099 39116 39116 \n", "unique 38420 9977 5 221 \n", "top Home away from home Michael Manhattan Williamsburg \n", "freq 15 338 17331 3099 \n", "\n", " room_type \n", "count 39116 \n", "unique 3 \n", "top Entire home/apt \n", "freq 20299 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_train.describe()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 188 }, "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1654407662210, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "m3HHLxc-VEfm", "outputId": "a5dae84a-9bc9-4ccc-bed1-89f967e52b75" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
0 |
Room in South Harlem near Central Park |
Gilles |
Manhattan |
Harlem |
Private room |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles Manhattan \n", "\n", " neighbourhood room_type \n", "0 Harlem Private room " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
name |
host_name |
neighbourhood_group |
neighbourhood |
room_type |
0 |
Room in South Harlem near Central Park |
Gilles |
2 |
94 |
1 |
\n", "
" ], "text/plain": [ " name host_name neighbourhood_group \\\n", "0 Room in South Harlem near Central Park Gilles 2 \n", "\n", " neighbourhood room_type \n", "0 94 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 범주형 변수\n", "from sklearn.preprocessing import LabelEncoder\n", "le = LabelEncoder()\n", "cols = [\n", " 'neighbourhood_group',\n", " 'neighbourhood',\n", " 'room_type'\n", " ]\n", "\n", "display(c_train.head(1))\n", "for col in cols:\n", " c_train[col] = le.fit_transform(c_train[col])\n", " c_test[col] = le.transform(c_test[col])\n", "\n", "display(c_train.head(1))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1654407662211, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "rO_pPlqXW7mT" }, "outputs": [], "source": [ "del_cols =['name','host_name']\n", "c_train = c_train.drop(del_cols, axis=1)\n", "c_test = c_test.drop(del_cols, axis=1)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 288 }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654407678323, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "pjU9by2YVEh9", "outputId": "bfa46c6a-951d-4f6e-b36b-5d1716d3ef87" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(39116, 11) (9779, 11)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
host_id |
latitude |
longitude |
minimum_nights |
number_of_reviews |
reviews_per_month |
calculated_host_listings_count |
availability_365 |
neighbourhood_group |
neighbourhood |
room_type |
0 |
0.343458 |
0.731742 |
0.539318 |
0.001601 |
0.004769 |
0.001538 |
0.003067 |
0.000000 |
2 |
94 |
1 |
1 |
0.178671 |
0.631633 |
0.478445 |
0.000801 |
0.101749 |
0.028718 |
0.003067 |
0.717808 |
2 |
95 |
0 |
2 |
0.001595 |
0.558041 |
0.449354 |
0.047238 |
0.001590 |
0.003419 |
0.000000 |
0.000000 |
2 |
209 |
0 |
3 |
0.013033 |
0.464162 |
0.579361 |
0.002402 |
0.379968 |
0.049402 |
0.003067 |
0.002740 |
1 |
13 |
0 |
4 |
0.045468 |
0.458611 |
0.543571 |
0.021617 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
1 |
13 |
1 |
\n", "
" ], "text/plain": [ " host_id latitude longitude minimum_nights number_of_reviews \\\n", "0 0.343458 0.731742 0.539318 0.001601 0.004769 \n", "1 0.178671 0.631633 0.478445 0.000801 0.101749 \n", "2 0.001595 0.558041 0.449354 0.047238 0.001590 \n", "3 0.013033 0.464162 0.579361 0.002402 0.379968 \n", "4 0.045468 0.458611 0.543571 0.021617 0.000000 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \\\n", "0 0.001538 0.003067 0.000000 \n", "1 0.028718 0.003067 0.717808 \n", "2 0.003419 0.000000 0.000000 \n", "3 0.049402 0.003067 0.002740 \n", "4 0.000000 0.000000 0.000000 \n", "\n", " neighbourhood_group neighbourhood room_type \n", "0 2 94 1 \n", "1 2 95 0 \n", "2 2 209 0 \n", "3 1 13 0 \n", "4 1 13 1 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 분리한 데이터 다시 합침\n", "train = pd.concat([n_train, c_train], axis=1)\n", "test = pd.concat([n_test, c_test], axis=1)\n", "print(train.shape, test.shape)\n", "train.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "CJ_v5LsbBJDz" }, "source": [ "## 검증 데이터 분리" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 257, "status": "ok", "timestamp": 1654407696859, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "IHaYuF40BLJY", "outputId": "5888e9f3-2878-4c63-bf29-6bea194a5bf0" }, "outputs": [ { "data": { "text/plain": [ "((31292, 11), (7824, 11), (31292,), (7824,))" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 검증 데이터 분리\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_val, y_train, y_val = train_test_split(train, target, test_size=0.2, random_state=2022)\n", "X_train.shape, X_val.shape, y_train.shape, y_val.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "0vuCvtU_B4Nb" }, "source": [ "## 머신러닝" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 45082, "status": "ok", "timestamp": 1654408150815, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "rJ8ECneaG3dc", "outputId": "a12c41e2-4a17-4610-c4fc-30ce62d556b8" }, "outputs": [ { "data": { "text/plain": [ "25325.322081149014" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 랜덤포레스트\n", "from sklearn.ensemble import RandomForestRegressor\n", "\n", "model = RandomForestRegressor(random_state=2022, n_estimators=200)\n", "model.fit(X_train, y_train)\n", "pred = model.predict(X_val)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_val, pred)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 34774, "status": "ok", "timestamp": 1654408323848, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "yqEzdSnSG3gV", "outputId": "c8ca8722-e919-4571-8388-620bef5a8f0e" }, "outputs": [ { "data": { "text/plain": [ "25027.49844781765" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Xgboost\n", "from xgboost import XGBRegressor\n", "model = XGBRegressor(max_depth=10,\n", " learning_rate=0.02,\n", " n_estimators=500,\n", " random_state=2022)\n", "\n", "model.fit(X_train, y_train)\n", "pred = model.predict(X_val)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_val, pred)" ] }, { "cell_type": "markdown", "metadata": { "id": "OVVOOGAkIb8R" }, "source": [ "## 채점" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 1105, "status": "ok", "timestamp": 1654408331733, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "62aQrZeyG3nb", "outputId": "f73e32d4-4d2e-409d-fa87-a14f3f02a4ad" }, "outputs": [ { "data": { "text/plain": [ "42778.2854814971" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# test 데이터 예측 및 평가\n", "y_test = pd.read_csv('y_test.csv')\n", "\n", "# Xgboost\n", "pred = model.predict(test)\n", "\n", "pred = np.expm1(pred)\n", "mean_squared_error(y_test, pred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654407722125, "user": { "displayName": "Tae Heon Kim", "userId": "07653788752262629837" }, "user_tz": -540 }, "id": "cgJYGvYAPwnL" }, "outputs": [], "source": [] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyMwWhKBT8hxADeG/1F1qBkW", "name": "머신러닝_기초_노드10_프로젝트(성능향상).ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 1 }